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Preface 


This is the second edition of a book that presents an introduction to probability 
theory, statistics, random processes, and the analysis of systems with random 
inputs. It is written at a level that is suitable for junior and senior engineering 
students and presumes that the student is familiar with conventional methods of 
system analysis such as convolution and transform techniques. However, it may 
also serve graduate students and engineers as a concise review of material that 
they previously encountered in widely scattered sources. 

This edition differs from the first in several respects. In the first place, new 
material on statistics has been added to provide practical applications for some of 
the probability concepts developed in the first three chapters of the book. Expla- 
nations of the more difficult concepts have been expanded throughout the book, 
and many more examples to illustrate these concepts have been provided. Fur- 
thermore, there are now more exercises incorporated within the text to provide the 
reader with the opportunity to test his or her mastery of the concepts discussed in 
each section. Finally, the problems at the end of each chapter are entirely new 
and illustrate a wider range of applications than the previous edition. 

Since this is an engineering text, the treatment is heuristic rather than rigorous, 
and the student will find many examples of the application of these concepts to 
engineering problems. However, it is not completely devoid of the mathematical 
subtleties, and considerable attention has been devoted to pointing out some of 
the difficulties that make a more advanced study of the subject essential if one is 
to master it. The authors believe that the educational process is best served by 
repeated exposure to difficult subject matter; this text is intended to be the first 
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exposure to probability and random processes and, we hope, not the last. Thus, 
the book is not comprehensive, but deals selectively with those topics that the 
authors have found most useful in the solution of engineering problems. 

A brief discussion of some of the significant features of this book will help set 
the stage for a discussion of the various ways it can be used. Elementary concepts 
of discrete probability are introduced in Chapter 1; first from the intuitive stand- 
point of the relative-frequency approach and then from the more rigorous stand- 
point of axiomatic probability. Simple examples illustrate all these concepts and 
are more meaningful to engineers than are the traditional examples of selecting 
red and white balls from urns. 

The concept of a random variable is introduced in Chapter 2 along with the 
ideas of probability distribution and density functions, mean values, and condi- 
tional probability. A significant feature of this chapter is an extensive discussion 
of many different probability density functions and the physical situations in 
which they may occur. Chapter 3 extends the random variable concept to situ- 
ations involving two or more random variables and introduces the concepts of 
statistical independence and correlation. 

Entirely new material on statistics appears in Chapter 4. Sampling theory, as 
applied to statistical estimation, is considered in some detail and a thorough dis- 
cussion of sample mean and sample variance is given. The distribution of the 
sample is described and the use of confidence intervals in making statistical deci- 
sions is both considered and illustrated by many examples of hypothesis testing. 
The problem of fitting smooth curves to experimental data is analyzed, and the 
use of linear regression is illustrated by practical examples. 

A general discussion of random processes and their classification is given in 
Chapter 5. The emphasis here is on selecting probability models that are useful in 
solving engineering problems. Accordingly, a great deal of attention is devoted to 
the physical significance of the various process classifications, with no attempt at 
mathematical rigor. A unique feature of this chapter, which is continued in sub- 
sequent chapters, is an introduction to the practical problem of estimating the 
mean of a random process from an observed sample function. 

Properties and applications of autocorrelation and crosscorrelation functions are 
discussed in Chapter 6. Many examples are presented in an attempt to develop 
some insight into the nature of correlation functions. The important problem of 
estimating autocorrelation functions is discussed in some detail. 

Chapter 7 turns to a frequency-domain representation of random processes by 
introducing the concept of spectral density. Unlike most texts, which simply de- 
fine spectral density as a Fourier transform of the correlation function, a more 
fundamental approach is adopted here in order to bring out the physical signifi- 
cance of the concept. This chapter is the most difficult one in the book, but the 
authors believe the material should be presented in this way. Instructors who wish 
to by-pass some of the more fundamental problems may omit Section 7-2 and 
bridge the gap by defining spectral density simply as the Fourier transform of the 
correlation function. 
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Chapter 8 utilizes the concepts of correlation functions and spectral density to 
analyze the response of linear systems to random inputs. In a sense, this chapter 
is a culmination of all that preceded it, and is particularly significant to engineers 
who must use these concepts. Hence, it contains a great many examples that are 
relevant to engineering problems and emphasizes the need for mathematical mod- 
els that are both realistic and manageable. 

Chapter 9 extends the concepts of systems analysis to consider systems that are 
optimum in some sense. Both the classical matched filter for known signals and 
the Wiener filter for random signals are considered from an elementary standpoint. 

In a more general vein, each chapter contains references that the reader may 
use to extend his or her knowledge. There is also a wide selection of problems at 
the end of each chapter. A solution manual for these problems is available to the 
instructor. Tables of functions, integrals, and other useful information that will 
aid the reader in solving the problems appear in a number of appendices at the 
end of the book. 

As an additional aid to learning and using the concepts and methods discussed 
in this text, there are exercises at the end of each major section. The reader should 
consider these exercises as part of the reading assignment and should make every 
effort to solve each one before going on to the next section. Answers are provided 
so that the reader may know when his or her efforts have been successful. It 
should be noted, however, that the answers to each exercise may not be listed in 
the same order as the questions. This is intended to provide an additional chal- 
lenge. The presence of these exercises should substantially reduce the number of 
additional problems that need to be assigned by the instructor. 

The material in this text is used at Purdue University in a one-semester, three- 
credit course offered in the Fall semester of the junior year. Not all sections of 
the text are used in this course, but at least 90% of it 1s covered in reasonable 
detail. The sections usually omitted include 3-6, 5-6, 6-4, 6-9, 7-9, and 
9—6; but other choices may be made at the discretion of the instructor. There are, 
of course, many other ways in which the text material could be utilized. For 
example, a one-semester course with a more relaxed pace could be given by omit- 
ting all of Chapter 9 in addition to the sections noted above. For those schools on 
a quarter system, the material noted above could be covered in a four-credit 
course. Alternatively, if a three-credit course were desired, it is suggested that, in 
addition to the omissions noted above, Sections 1—5, 1—6, 1-7, 1—9, 2—6, 3-5, 
7-2, 7-8, 7—10, 8—9, and all of Chapter 9 can be omitted if the instructor supplies 
a few explanatory words to bridge the gaps. Obviously, there are also many other 
possibilities that are open to the experienced instructor. 

It is a pleasure for the authors to acknowledge the very substantial aid and 
encouragement that they have received from their colleagues and students. A com- 
plete list is too lengthy to include here, but it is appropriate to mention a few 
individuals who made valuable suggestions. In connection with the first edition, 
these individuals included Professors J. Y. S. Luh and P. A. Wintz and Dr. Lewis 
A. Thurman, then all at Purdue University. Furthermore, the careful and percep- 
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tive reading of the preliminary manuscript of the first edition by Professor J. E. 
Kemmerly of the California State University at Fullerton and Professor James L. 
Massey at the Swiss Federal Institute of Technology is gratefully acknowledged. 
Their many suggestions greatly improved that edition. 

In connection with the second edition, many valuable suggestions have been 
received from Professor E. W. Chandler at Marquette University and from the 
reviewers commissioned by our editor, Deborah Moore: Richard H. Williams, 
University of New Mexico; Richard Christiansen, Brigham Young University; 
Donald Healy, Georgia Institute of Technology; Hugh Van Landingham, Virginia 
Polytechnic; and Soheil A. Dianat, Rochester Institute of Technology. Special 
thanks are due to Dr. C. P. Cheng for preparing the solutions manual and for 
proofreading the manuscript. Last, but not least, we acknowledge the contribu- 
tions made by hundreds of students who have used and criticized the first edition 
of this text. 

February 1986 George R. Cooper 
Clare D. McGillem 
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CHAPTER 1 





Introduction to 
Probability 





1—1 Engineering Applications of Probability 


Before embarking on a study of elementary probability theory, it is essential to 
motivate such a study by considering why probability theory is useful in the so- 
lution of engineering problems. This can be done in two different ways. The first 
is to suggest a viewpoint, or philosophy, concerning probability that emphasizes 
its universal physical reality rather than treating it as another mathematical disci- 
pline which may be useful occasionally. The second is to note some of the many 
different types of situations that arise in normal engineering practice in which the 
use of probability concepts is indispensable. 

A characteristic feature of probability theory is that it concerns itself with situ- 
ations that involve uncertainty in some form. The popular conception of this re- 
lates probability to such activities as tossing dice, drawing cards, and spinning 
roulette wheels. Because the rules of probability are not widely known, and be- 
cause such situations can become quite complex, the prevalent attitude is that 
probability theory is a mysterious and esoteric branch of mathematics that is ac- 
cessible only to trained mathematicians and is of limited value in the real world. 
Since probability theory does deal with uncertainty, another prevalent attitude is 
that a probabilistic treatment of physical problems is an inferior substitute for a 
more desirable exact analysis and is forced on the analyst by a lack of complete 
information. Both of these attitudes are false. 

Regarding the alleged difficulty of probability theory, it is doubtful there is any 
other branch of mathematics or analysis that is so completely based on such a 
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small number of easily understood basic concepts. Subsequent discussion reveals 
that the major body of probability theory can be deduced from only three axioms 
that are almost self-evident. Once these axioms and their applications are under- 
stood, the remaining concepts follow in a logical manner. 

The attitude that regards probability theory as a substitute for exact analysis 
stems from the current educational practice of presenting physical laws as deter- 
ministic, immutable, and strictly true under all circumstances. Thus, a law that 
describes the response of a dynamic system is supposed to predict that response 
precisely if the system excitation is known precisely. For example, Ohm's law 


v(t) = Ri(t) 


is assumed to be exactly true at every instant of time, and, on a macroscopic 
basis, this assumption may be well justified. On a microscopic basis, however, 
this assumption is patently false—a fact that is immediately obvious to anyone 
who has tried to connect a large resistor to the input of a high-gain amplifier and 
listened to the resulting noise. 

In the light of modern physics and our emerging knowledge of the nature of 
matter, the viewpoint that natural laws are deterministic and exact is untenable. 
They are, at best, a representation of the average behavior of nature. In many 
important cases this average behavior is close enough to that actually observed so 
that the deviations are unimportant. In such cases, the deterministic laws are ex- 
tremely valuable because they make it possible to predict system behavior with a 
minimum of effort. In other equally important cases, the random deviations may 
be significant—perhaps even more significant than the deterministic response. For 
these cases, analytic methods derived from the concepts of probability are essen- 
tial. 

From the above discussion, it should be clear that the so-called exact solution 
is not exact at all, but, in fact, represents an idealized special case that actually 
never arises in nature. The probabilistic approach, on the other hand, far from 
being a poor substitute for exactness, is actually the method that most nearly 
represents physical reality. Furthermore, it includes the deterministic result as a 
special case. 

It is now appropriate to discuss the types of situations in which probability 
concepts arise in engineering. The examples presented here emphasize situations 
that arise in systems studies; but they do serve to illustrate the essential point that 
engineering applications of probability tend to be the rule rather than the excep- 
tion. 


Random input signals. In order for a physical system to perform a useful 
task, it is usually necessary that some sort of forcing function (the input signal) 
applied to it. Input signals that have simple mathematical representations are 
convenient for pedagogical purposes or for certain types of system analysis, but 
they seldom arise in actual applications. Instead, the input signal is more likely to 
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involve a certain amount of uncertainty and unpredictability that justifies treating 
it as a random signal. There are many examples of this: speech and music signals 
that serve as inputs to communication systems; random digits applied to a com- 
puter; random command signals applied to an aircraft flight control system; ran- 
dom signals derived from measuring some characteristic of a manufactured prod- 
uct, and used as inputs to a process control system; steering wheel movements in 
an automobile power-steering system; the sequence in which the call and operating 
buttons of an elevator are pushed; the number of vehicles passing various check- 
points in a traffic control system; outside and inside temperature fluctuations as 
inputs to a building heating and airconditioning system; and many others. 


Random disturbances. Many systems have unwanted disturbances applied to 
their input or output in addition to the desired signals. Such disturbances are al- 
most always random in nature and call for the use of probabilistic methods even 
if the desired signa! does not. A few specific cases serve to illustrate several 
different types of disturbances. If, for a first example, the output of a high-gain 
amplifier is connected to a loudspeaker, one frequently hears a variety of snaps, 
crackles, and pops. This random noise arises from thermal motion of the conduc- 
tion electrons in the amplifier input circuit or from random variations in the num- 
ber of electrons (or holes) passing through the transistors. It is obvious that one 
cannot hope to calculate the value of this noise at every instant of time since this 
value represents the combined effects of literally billions of individual moving 
charges. It is possible, however, to calculate the average power of this noise, its 
frequency spectrum, and even the probability of observing a noise value larger 
than some specified value. As a practical matter, these quantities are more impor- 
tant in determining the quality of the amplifier than is a knowledge of the instan- 
taneous waveforms. 

As a second example, consider a radio or television receiver. In addition to 
noise generated within the receiver by the mechanisms noted, there is random 
noise arriving at the antenna. This results from distant electrical storms, man- 
made disturbances, radiation from space, or thermal radiation from surrounding 
objects. Hence, even if perfect receivers and amplifiers were available, the re- 
ceived signal would be combined with random noise. Again, the calculation of 
such quantities as average power and frequency spectrum may be more significant 
than the determination of instantaneous value. 

A different type of system is illustrated by a large radar antenna, which may be 
pointed in any direction by means of an automatic control system. The wind blow- 
ing on the antenna produces random forces that must be compensated for by the 
control system. Since the compensation is never perfect, there is always some 
random fluctuation in the antenna direction; it is important to be able to calculate 
the effective value and frequency content of this fluctuation. 

A still different situation is illustrated by an airplane flying in turbulent air, a 
ship sailing in stormy seas, or an army truck traveling over rough terrain. In all 


4 CHAPTER 1 INTRODUCTION TO PROBABILITY 


these cases, random disturbing forces, acting on complex mechanical systems, 
interfere with the proper control or guidance of the system. It is essential to de- 
termine how the system responds to these random input signals. 


Random system characteristics. The system itself may have characteristics 
that are unknown and that vary in a random fashion from time to time. Some 
typical examples are: aircraft in which the load (that is, the number of passengers 
or the weight of the cargo) varies from flight to flight; troposcatter communication 
systems in which the path attenuation varies radically from moment to moment; 
an electric power system in which the load (that is, the amount of energy being 
used) fluctuates randomly; and a telephone system in which the number of users 
changes from instant to instant. 

There are also many electronic systems in which the parameters may be ran- 
dom. For example, it is customary to specify the properties of many solid-state 
devices such as diodes, transistors, digital gates, shift registers, flip-flops, etc. by 
listing a range of values for the more important items. The actual value of the 
parameters are random quantities that lie somewhere in this range but are not 
known a priori. 


System reliability. All systems are composed of many individual elements, 
and one or more of these elements may fail, thus causing the entire system, or 
part of the system, to fail. The times at which such failures will occur are un- 
known, but it is often possible to determine the probability of failure for the 
individual elements and from these to determine the **mean time to failure’’ for 
the system. Such reliability studies are deeply involved with probability and are 
extremely important in engineering design. As systems become more complex, 
more costly, and contain larger numbers of elements, the problems of reliability 
become more difficult and take on added significance. 


Quality control. An important method of improving system reliability is to 
improve the quality of the individual elements, and this can often be done by an 
inspection process. As it may be too costly to inspect every element after every 
step during its manufacture, it is necessary to develop rules for inspecting ele- 
ments selected at random. These rules are based on probabilistic concepts and 
serve the valuable purpose of maintaining the quality of the product with the least 
expense. 


Information theory. A major objective of information theory is to provide a 
quantitative measure for the information content of messages such as printed 
pages, speech, pictures, graphical data, numerical data, or physical observations 
of temperature, distance, velocity, radiation intensity, and rainfall. This quantita- 
tive measure is necessary in order to be able to provide communication channels 
that are both adequate and efficient for conveying this information from one place 
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to another. Since such messages and observations are almost invariabley unknown 
in advance and random in nature, they can be described only in terms of proba- 
bility. Hence, the appropriate information measure is a probabilistic one. Further- 
more, the communication channels are subject to random disturbances (noise) that 
limit their ability to convey information, and again a probabilistic description is 
required. 

It should be clear from the above partial listing that almost any engineering 
endeavor involves a degree of uncertainty or randomness that makes the use of 
probabilistic concepts an essential tool for the present-day engineer. In the case of 
system analysis, it is necessary to have some description of random signals and 
disturbances. There are two general methods of describing random signals math- 
ematically. The first, and most basic, is a probabilistic description in which the 
random quantity is characterized by a probability model. This method is discussed 
later in this chapter. 

The probabilistic description of random signals cannot be used directly in sys- 
tem analysis since it tells very little about how the random signal varies with time 
or what its frequency spectrum is. It does, however, lead to the statistical descrip- 
tion of random signals, which is useful in system analysis. In this case the random 
signal is characterized by a statistical model, which consists of an appropriate set 
of average values such as the mean, variance, correlation function, spectral den- 
sity, and others. These average values represent a less precise description of the 
random signal than that offered by the probability model, but they are more useful 
for system analysis because they can be computed by using straightforward and 
relatively simple methods. Some of the statistical averages are discussed in sub- 
sequent chapters. 

There are many steps that need to be taken before it is possible to apply the 
probabilistic and statistical concepts to system analysis. In order that the reader 
may understand that even the most elementary steps are important to the final 
objective, it is desirable to outline these steps briefly. The first step is to introduce 
the concepts of probability by considering discrete random events. These concepts 
are then extended to continuous random variables and subsequently to random 
functions of time. Finally, several of the average values associated with random 
signals are introduced. At this point, the tools are available to consider ways of 
analyzing the response of linear systems to random inputs. 


1—2 Random Experiments and Events 


The concepts of experiment and event are fundamental to an understanding of 
elementary probability concepts. An experiment is some action that results in an 
outcome. A random experiment is one in which the outcome is uncertain before 
the experiment is performed. Although there is a precise mathematical definition 
of a random experiment, a better understanding may be gained by listing some 
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examples of well-defined random experiments and their possible outcomes. This 
is done in Table 1—1. It should be noted, however, that the possible outcomes 
often may be defined in several different ways depending upon the wishes of the 
experimenter. The initial discussion is concerned with a single performance of a 
well-defined experiment. This single performance is referred to as a trial. 

An important concept in connection with random events is that of equally likely 
events. For example, if we toss a coin we expect that the event of getting a head 
and the event of getting a fail are equally likely. Likewise, if we roll a die we 
expect that the events of getting any number from 1 to 6 are equally likely. Also, 
when a card is drawn from a deck, each of the 52 cards is equally likely. A term 
that is often used to be synonymous with the concept of equally likely events is 
that of selected at random. For example, when we say that a card is selected at 
random from a deck, we are implying that all cards in the deck are equally likely 
to have been chosen. In general, we assume that the outcomes of an experiment 
are equally likely unless there is some clear physical reason why they should not 
be. In the discussions that follow, there will be examples of events that are as- 
sumed to be equally likely and events that are not assumed to be equally likely. 
The reader should clearly understand the physical reasons for the assumptions in 
both cases. 

It is also important to distinguish between elementary events and composite 
events. An elementary event is one for which there is only one outcome. Exam- 
ples of elementary events include such things as tossing a coin or rolling a die 
when the events are defined in a specific way. When a coin is tossed, the event 
of getting a head or the event of getting a tail can be achieved in only one way. 
Likewise, when a die is rolled the event of getting any integer from | to 6 can be 
achieved in only one way. Hence, in both cases, the defined events are elementary 
events. On the other hand, it is possible to define events associated with rolling a 
die that are not elementary. For example, let one event be that of obtaining an 
even number while another event is that of obtaining an odd number. In this case, 
each event can be achieved in three different ways and, hence, these events are 
composite. 

There are many different random experiments in which the events can be de- 
fined to be either elementary or composite. For example, when a card is selected 
at random from a deck of 52 cards, there are 52 elementary events corresponding 


Table 1-1. Possible Experiments and Their Outcomes. 





Experiment Possible Outcomes 
Flipping a coin Heads (H), tails (T) 
Throwing a die 1,2, 3,4, 5, 6 

Drawing a card Any of the 52 possible cards 
Observing a voltage Greater than 0, less than 0 
Observing a voltage Greater than V, less than V 
Observing a voltage Between V, and V3, 


not between V, and V; 
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to the selection of each of the cards. On the other hand, the event of selecting a 
heart is a composite event containing 13 different outcomes. Likewise, the event 
of selecting an ace is a composite event containing 4 outcomes. Clearly, there are 
many other ways in which composite events could be defined. 

When the number of outcomes of an experiment are countable (that is, they can 
be put in one-to-one correspondence with the integers), the outcomes are said to 
be discrete. All of the examples discussed above represent discrete outcomes. 
However, there are many experiments in which the outcomes are not countable. 
For example, if a random voltage is observed, and the outcome taken to be the 
value of the voltage, there may be an infinite and noncountable number of possible 
values that can be obtained. In this case, the outcomes are said to form a contin- 
uum. The concept of an elementary event does not apply in this case. 

It is also possible to conduct more complicated experiments with more compli- 
cated sets of events. The experiment may consist of tossing ten coins, and it is 
apparent in this case that there are many different possible outcomes, each of 
which may be an event. Another situation, which has more of an engineering 
flavor, is that of a telephone system having 10,000 telephones connected to it. At 
any given time, a possible event is that 2000 of these telephones are in use. 
Obviously, there are a great many other possible events. 

If the outcome of an experiment is uncertain before the experiment is per- 
formed, the possible outcomes are random events. To each of these events it is 
possible to assign a number, called the probability of that event, and this number 
is a measure of how likely that event is. Usually, these numbers are assumed, the 
assumed values being based on our intuition about the experiment. For example, 
if we toss a coin, we would expect that the possible outcomes of heads and tails 
would be equally likely. Therefore, we would assume the probabilities of these 
two events to be the same. 


1—3 Definitions of Probability 


One of the most serious stumbling blocks in the study of elementary probability 
is that of arriving at a satisfactory definition of the term ''probability."' There are, 
in fact, four or five different definitions for probability that have been proposed 
and used with varying degrees of success. They all suffer from deficiencies in 
concept or application. Ironically, the most successful ''definition'' leaves the 
term probability undefined. 

Of the various approaches to probability, the two that appear to be most useful 
are the relative-frequency approach and the axiomatic approach. The relative-fre- 
quency approach is useful because it attempts to attach some physical significance 
to the concept of probability and, thereby, makes it possible to relate probabilistic 
concepts to the real world. Hence, the application of probability to engineering 
problems is almost always accomplished by invoking the concepts of relative fre- 
quency, even when the engineer may not be conscious that he is doing so. 
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The limitation of the relative-frequency approach is the difficulty of using it to 
deduce the appropriate mathematical structure for situations that are too compli- 
cated to be analyzed readily by physical reasoning, This is not to imply that this 
approach cannot be used in such situations, for it can, but it does suggest that 
there may be a much easier way to deal with these cases. The easier way turns 
out to be the axiomatic approach. 

The axiomatic approach treats the probability of an event as a number that 
satisfies certain postulates but is otherwise undefined. Whether or not this number 
relates to anything in the real world is of no concern in developing the mathemat- 
ical structure that evolves from these postulates. Engineers may object to this 
approach as being too artificial and too removed from reality, but they should 
remember that the whole body of circuit theory was developed in essentially the 
same way. In the case of circuit theory, the basic postulates are Kirchhoff's laws 
and the conservation of energy. The same mathematical structure emerges regard- 
less of what physical quantities are identified with the abstract symbols—or even 
-f no physical quantities are associated with them. It is the task of the engineer to 
relate this mathematical structure to the real world in a way that is admittedly not 
exact, but that leads to useful solutions to real problems. 

From the above discussion, it appears that the most useful approach to proba- 
bility for engineers is a two-pronged one, in which the relative-frequency concept 
is employed in order to relate simple results to physical reality, and the axiomatic 
approach is employed to develop the appropriate mathematics for more compli- 
cated situations. It is this philosophy that is presented here. 


1—4 The Relative-Frequency Approach 


As its name implies, the relative-frequency approach to probability is closely 
linked to the frequency of occurrence of the defined events. For any given event, 
the frequency of occurrence is used to define a number called the probability of 
that event and this number is a measure of how likely that event is. Usually, these 
numbers are assumed, the assumed values being based on our intuition about the 
experiment or on the assumption that the events are equally likely. 

In order to make this concept more precise, consider an experiment that is 
performed N times and for which there are four possible outcomes that are consid- 
ered to be the elementary events A, B, C, and D. Let N4 be the number of times 
that event A occurs, with a similar notation for the other events. It is clear that 


NA + Ne - Nc - Np — N (1-1) 


We now define the relative frequency of A, r(A) as 


_ Na 
r(A) = N 
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From (1-1) it is apparent that 
r(A) + r(B) + r(C) + r(D) = 1 (1-2) 


Now imagine that N increases without limit. When a phenomenon known as Sta- 
tistical regularity applies, the relative frequency r(A) tends to stabilize and ap- 
proach a number, Pr (A), that can be taken as the probability of the elementary 
event A. That is 


Pr (A) = lim r(A) (1-3) 
N> 


From the relation given above, it follows that 
Pr(A) + Pr(B) + Pr(C) +--+ Pr(M) = 1 (1-4) 


and we can conclude that the sum of the probabilities of all of the mutually exclu- 
sive events associated with a given experiment must be unity. 
These concepts can be summarized by the following set of statements: 


l. Ox Pr(A) s lI. 

2. Pr(A) + Pr (B) + Pr (C) +--+ + Pr (M) = 1, for a complete set of 
mutually exclusive events. 

3. An impossible event is represented by Pr (A) = 0. 

4. A certain event is represented by Pr (A) = 1. 


In order to make some of these ideas more specific, consider the following 
hypothetical example. Assume that a large bin contains an assortment of resistors 
of different sizes, which are thoroughly mixed. In particular, let there be 100 
resistors having a marked value of 1 (2, 500 resistors marked 10 2, 150 resistors 
marked 100 (2, and 250 resistors marked 1000 §2. Someone reaches into the bin 
and pulls out one resistor at random. There are now four possible outcomes cor- 
responding to the value of the particular resistor selected. We desire to determine 
the probability of each of these events. In order to do this, we assume that the 
probability of each event is proportional to the number of resistors in the bin 
corresponding to that event. Since there are 1000 resistors in the bin all together, 
the resulting probabilities are 


100 500 
Pr (1 Q) = — = 0.1 Pr (10 Q) = — = 0. 
use 1000 uv d 1000 es 
150 250 
r ( ) i 0.15 Pr (1000 £2) 0.25 


Note that these probabilities are all positive, less than 1, and do add up to 1. 
Many times one is interested in more than one event at a time. If a coin is 
tossed twice, one may wish to determine the probability that a head will occur on 
both tosses. Such a probability is referred to as a joint probability. In this partic- 
ular case, one assumes that all four possible outcomes (HH, HT, TH, and TT) are 
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equally likely and, hence, the probability of each is one quarter. In a more general 
case the situation is not this simple, so it is necessary to look at a more compli- 
cated situation in order to deduce the true nature of joint probability. The notation 
employed is Pr (A,B) and signifies the probability of the joint occurrence of events 
A and B. 

Consider again the bin of resistors and specify that in addition to having differ- 
ent resistance values, they also have different power ratings. Let the different 
power ratings be 1 W, 2 W, and 5 W; the number having each rating is indicated 
in Table 1-2. 

Before using this example to illustrate joint probabilities, consider the probabil- 
ity (now referred to as a marginal probability) of selecting a resistor having a 
given power rating without regard to its resistance value. From the totals given in 
the right-hand column, it is clear that these probabilities are 


440 200 
W) = —— = 0.44 - 2. 
Pr(L W) == Pr (2 W) = = 0.20 
360 
Pr (5 W) = —— = 0.36 
1000 


We now ask what the joint probability is of selecting a resistor of 10 N having 
a 5-W power rating. Since there are 150 such resistors in the bin, this joint prob- 
ability is clearly 

! 150 
Pr (10 2, 5 W) = 1000 ^ 0.15 
The eleven other joint probabilities can be determined in a similar way. Note that 
some of the joint probabilities are zero [for example, Pr (1 2, 5 W) = 0] simply 
because a particular combination of resistance and power does not exist. 

It is necessary at this point to relate the joint probabilities to the marginal prob- 
abilities. In the example of tossing a coin two times, the relationship is simply a 
product. That is, 

l 


| l l 
Pr (H,H) = Pr (H) Pr (H) = 2 x "4 


Table 1-2. Resistance Values and Power Ratings. 


Resistance Values 


Power Rating IQ 10 Q 100 9 1000 N Totals 
1 W | 50 300 90 0 440 
2W 50 50 0 100 200 
5 W 0 150 60 150 360 


Totals LOO 500 150 250 1000 
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But this relationship is obviously not true for the resistor bin example. Note that 


360 
Pr (5 W) = —— = 0.36 
FO) 7 ooo 
and it was previously shown that 
Pr (10 2) = 0.5 


Thus, 
Pr (10 Q) Pr (5 W) = 0.5 x 0.36 = 0.18 Æ Pr (10 2,5 W) = 0.15 


and the joint probability is not the product of the marginal probabilities. 

In order to clarify this point, it is necessary to introduce the concept of condi- 
tional probability. This 1s the probability of one event A, given that another event 
B has occurred; it is designated as Pr (A | B). In terms of the resistor bin, consider 
the conditional probability of selecting a 10-£2 resistor when it is already known 
that the chosen resistor 1s 5 W. Since there are 360 5-W resistors, and 150 of 
these are 10 (2, the required conditional probability is 

| 150 
Pr (10 2 | 5 W) = — = 0.417 
| 360 
Now consider the product of this conditional probability and the marginal proba- 
bility of selecting a 5-W resistor. 
Pr (10 Q | 5 W) Pr (5 W) = 0.417 x 0.36 = 0.15 = Pr (10 2, 5 W) 
It is seen that this product is indeed the joint probability. 

The same result can also be obtained another way. Consider the conditional 

probability 
150 


Pr (5 W | 10 2) = [= 0.30 


since there are 150 5-W resistors out of the 500 10-0 resistors. Then form the 
product 


Pr (5 W | 10 Q) Pr (10 Q) = 0.30 x 0.5 = Pr(102,5W) (1-5) 


Again, the product is the joint probability. 
The foregoing ideas concerning joint probability can be summarized in the gen- 
eral equation 


Pr (A,B) = Pr (A | B) Pr (B) = Pr (B | A) Pr (A) (1-6) 


which indicates that the joint probability of two events can always be expressed 
as the product of the marginal probability of one event and the conditional prob- 
ability of the other event given the first event. 
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We now return to the coin-tossing problem, in which it is indicated that the 
joint probability can be obtained as the product of two marginal probabilities. 
Under what conditions will this be true? From equation (1—6) it appears that this 
can be true if 


Pr (A | B) = Pr (A) and Pr(B | A) = Pr (B) 


These statements imply that the probability of event A does not depend upon 
whether or not event B has occurred. This is certainly true in coin tossing, since 
the outcome of the second toss cannot be influenced in any way by the outcome 
of the first toss. Such events are said to be statistically independent. More pre- 
cisely, two random events are statistically independent if and only if 


Pr (A,B) = Pr (A) Pr (B) (1-7) 


The preceding paragraphs provide a very brief discussion of many of the basic 
concepts of discrete probability. They have been presented in a heuristic fashion 
without any attempt to justify them mathematically. Instead, all of the probabili- 
ties have been formulated by invoking the concepts of relative frequency and 
equally likely events in terms of specific numerical examples. It is clear from 
these examples that it is not difficult to assign reasonable numbers to the proba- 
bilities of various events (by employing the relative-frequency approach) when the 
physical situation is not very involved. It should also be apparent, however, that 
such an approach might become unmanageable when there are many possible out- 
comes to any experiment and many different ways of defining events. This is 
particularly true when one attempts to extend the results for the discrete case to 
the continuous case. It becomes necessary, therefore, to reconsider all of the 
above ideas in a more precise manner and to introduce a measure of mathematical 
rigor that provides a more solid footing for subsequent extensions. 





Exercise 1—4.1 


a) A box contains 25 transistors, of which 8 are known to be bad. 
A transistor is selected at random and tested. What is the prob- 
ability that it is bad? 


b) If the first transistor tests bad, what is the probability that a 
second transistor selected at random will also be bad? 


c) If the first transistor tests good, what is the probability that a 
second transistor selected at random will be bad? 


Answers: 1/3, 8/25, 7/24 
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(Note: In the exercise above, and in others throughout the book, an- 
swers are not necessarily given in the same order as the questions.) 


Exercise 1—4.2 


A traffic survey on a busy highway reveals that one out of every three 
vehicles is a truck. This survey also established that one-tenth of all 
the automobiles are unsafe to drive and one-twentieth of all the trucks 
are unsafe to drive. 


a) What is the probability that the next vehicle to pass a given 
point will be an unsafe truck? 


b) What is the probability that the next vehicle will be a truck, 
given that it is unsafe? 


c) Whatis the probability that the next vehicle that passes a given 
point will be a truck, given that the previous vehicle was an 
automobile? 


Answers: 1/5, 1/3, 1/60 





1—5 Elementary Set Theory 


The more precise formulation mentioned in Section 1—4 is accomplished by put- 
ting the ideas introduced in that section into the framework of the axiomatic ap- 
proach. In order to do this, however, it is first necessary to review some of the 
elementary concepts of set theory. 

A set is a collection of objects known as elements. It will be designated as 


A= la, Ola,» 2 ay ot] (1-8) 
where the set is A and the elements are œj, . . . , a,. The set A may consist of 
the integers from | to 6 so thata@, = 1, a, = 2,. . ., a, = 6 are the elements. 


A subset of A is any set all of whose elements are also elements of A. B = {1,2,3} 
is a subset of the set A = (1, 2, 3, 4, 5, 6}. The general notation for indicating 
that B is a subset of A is B C A. Note that every set is a subset of itself. 

All sets of interest in probability theory have elements taken from the largest 
set called a space and designated as 5. Hence, all sets will be subsets of the space 
5. The relation of S and its subsets to probability will become clear shortly, but 
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in the meantime, an illustration may be helpful. Suppose that the elements of a 
space consist of the six faces of a die, and that these faces are designated as 1, 2, 
. . . ,6. Thos, 


S = {1,2,3,4 5, 6} 


There are many ways in which subsets might be formed, depending upon the 
number of elements belonging to each subset. In fact, if one includes the null set 
or empty set, which has no elements in it and is denoted by @, there are 64 = 2° 
subsets and they may be denoted as 


iis e « « ete tlt Ob « = « oe Oh. AL. 2 Shs 2 2 


In general, if S contains n elements, then there are 2” subsets. The proof of this 
is left as an exercise for the student. 

One of the reasons for using set theory to develop probability concepts is that 
the important operations are already defined for sets and have simple geometric 
representations that aid in visualizing and understanding these operations. The 
geometric representation is the Venn diagram in which the space 5 is represented 
by a square and the various sets are represented by closed plane figures. For 
example, the Venn diagram shown in Figure 1—1 shows that B is a subset of A 
and that C is a subset of B (and also of A). The various operations are now defined 
and represented by Venn diagrams. 


Equality. Set A equals set B iff (if and only if) every element of A is an 
element of B and every element of B is an element of A. Thus 
AB if ACB and BCA 


The Venn diagram is obvious and will not be shown. 





Figure 1-1. Venn diagram for C C B C A. 
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Figure 1-2. The sum of two sets, A U B. 


Sums. The sum or union of two sets is a set consisting of all the elements 
that are elements of A or of B or of both. This is shown in Figure 1—2. Since the 
associative law holds, the sum of more than two sets can be written without pa- 
rentheses. That is 


(AUB)UC-AU(BUC)-AUBUC 


The commutative law also holds, so that 


AUA#=A 
AUS =A 
AUS =85S 
AUB=A, ifBCA 


Products. The product or intersection of two sets is the set consisting of all 
the elements that are common to both sets. It is designated as A N B and is 
illustrated in Figure 1—3. A number of results apparent from the Venn diagram 
are 





Figure 1-3. The intersection of two sets, A N B. 
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Figure 1-4. Intersections for three sets. 


ANB = Ba (Commutative law) 
ANA=A 

AND=H 

ANS =A 

ANB=8B,ifBCA 


If there are more than two sets involved in the product, the Venn diagram of 
Figure 1—4 is appropriate. From this it is seen that 
(AANBNCHAN(BNQO=ANBNC 
A f1 (B U C) — (A n) B) U (A n1 C) (Associative law) 
Two sets A and B are mutually exclusive or disjoint if A O B = Ø. Represen- 
tations of such sets in the Venn diagram do not overlap. 


Complement. The complement of a set A is a set containing all the elements 
of S that are not in A. It is denoted A and is shown in Figure 1—5. It is clear that 


G=S 
S249 
(4A) — A 
AUA-S 
AnA-g 
ACB, ifBCA 
A=B, ifA=B 


Two additional relations that are usually referred to as DeMorgan's laws are 
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Figure 1-5. The complement of A. 


Differences. The difference of two sets, A — B, is a set consisting of the 
elements of A that are not in B. This is shown in Figure 1—6. The difference may 


also be expressed as 
A—B-ARnB - A - (ANB) 
The notation (A — B) is often read as ''A take away B.” The following results 
are also apparent from the Venn diagram: 
(A—B)UB *A 
(AUA-—-A- 2 
AU(A-A)=A 


A-W=A 
A-S=@ 
S-A=A 





Figure 1-6. The difference of two sets. 
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Note that when differences are involved, the parentheses cannot be omitted. 

It is desirable to illustrate all of the above operations with a specific example. 
In order to do this, let the elements of the space S be the integers from 1 to 6, as 
before: 


S = {1, 2, 3, 4, 5, 6} 
and define certain sets as 
A = {2, 4, 6}, B = {1, 2, 3, 4}, C = {l, 3, 5} 
From the definitions just presented, it is clear that 
(A U B) = (1, 2, 3, 4, 6}, (B UC) = {1, 2, 3, 4, 5} 


AnNB={,4, BüC-(,3; Aanc=2 
ANBNC=2, A-21(,3,3)-C, B= {5,6 
= { 


C-(2,4,60-A A-—B= {6}, B-A= fi, 3} 
C—B= {5}, (A —B)UB = {l, 2, 3,4, 6 


The student should verify these results. 





Exercise 1—5.1 


If A and B are subsets in the same space S, find: 
a) (A- B) (B - A) 
b (A-B)nB 
c) (A — B) U (An B) 


Answers: A, (A — B), 0 


Exercise 1—5.2 
A space S = (a, b, c, d, e, f) has two subsets defined as A = (a, c, 
e) and B = (c, d, e, f}. Find: 

a AUB 

b AnB 

c) (A - B) 
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d AnB 
e) ANB 
f} (B-AUA 


Answers: (a, c, d, e, f}, {a}, id, f], (a, c, d, e, f], {a}, (c, e) 





1—6 The Axiomatic Approach 


It is now necessary to relate probability theory to the set concepts that have just 
been discussed. This relationship is established by defining a probability space 
whose elements are all the outcomes (of a possible set of outcomes) from an 
experiment. For example, if an experimenter chooses to view the six faces of a 
die as the possible outcomes, then the probability space associated with throwing 
a die is the set 


$ = {1, 2, 3, 4, 5, 6} 


The various subsets of S can be identified with the events. For example, in the 
case of throwing a die, the event {2} corresponds to obtaining the outcome 2, 
while the event (1, 2, 3) corresponds to the outcomes of either 1, or 2, or 3. Since 
at least one outcome must be obtained on each trial, the space § corresponds to 
the certain event and the empty set Ø corresponds to the impossible event. Any 
event consisting of a single element is called an elementary event. 

The next step is to assign to each event a number called, as before, the proba- 
bility of the event. If the event is denoted as A, the probability of event A is 
denoted as Pr (A). This number is chosen so as to satisfy the following three 
conditions or axioms: 


Pr (A) = 0 (1-9) 
Pr (S) = 1 (1-10) 
If A ' B = ©, then Pr (A U B) = Pr (A) + Pr (B) (1-11) 


The whole body of probability can be deduced from these axioms. It should be 
emphasized, however, that axioms are postulates and, as such, it is meaningless 
to try to prove them. The only possible test of their validity is whether the result- 
ing theory adequately represents the real world. The same is true of any physical 
theory. 

A large number of corollaries can be deduced from these axioms and a few are 
developed here. First, since 


S$nee-Ü and o's 
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it follows from (1—9) that 
Pr (S U £2) = Pr (S) 


Pr (5) + Pr (2) 


Hence, 
Pr (2) = 0 (1-12) 
Next, since 
ANA=@ and AUA=S 
it also follows from (1—9) that 
Pr (A U A) = Pr (A) + Pr (A) = Pr (5) = 1 (1-13) 
From this and from (1—7) 
Pr (A) = 1 — Pr (A) <1 (1-14) 


Therefore, the probability of an event must be a number between O and 1. 

If A and B are not mutually exclusive, then (1—11) usually does not hold. A 
more general result can be obtained, however. From the Venn diagram of Figure 
1—3 it is apparent that 


AUB-AU (An B) 
and that A and A N B are mutually exclusive. Hence, from (1—11) it follows that 
Pr (A U B) = Pr (A U A N B) = Pr (A) + Pr (A N B) 
From the same figure it is also apparent that 
B = (A B) U (ANB) 
and that A N B and A N B are mutually exclusive. From (1—9) 
Pr (B) = Pr [(A N B) U (AN B)] = Pr (A N B) + Pr(ANB) (1-15) 
Upon eliminating Pr (A N B), it follows that 
Pr (A U B) = Pr (A) + Pr (B) — Pr (A N B) = Pr (A) + Pr (B) (1-16) 
which is the desired result. 

Now that the formalism of the axiomatic approach has been established, it is 
desirable to look at the problem of constructing probability spaces. First consider 
the case of throwing a single die and the associated probability space of S = {1, 
2, 3, 4, 5, 6}. The elementary events are simply the integers associated with the 
upper face of the die and these are clearly mutually exclusive. If the elementary 
events are assumed to be equally probable, then the probability associated with 


each is simply 


l 
Pr {a} = z» a= 1,2,...,6 
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Note that this assumption is consistent with the relative-frequency approach, but 
within the framework of the axiomatic approach it is only an assumption, and any 
number of other assumptions could have been made. 

For this same probability space, consider the event A = (1, 3} = {1} U {3}. 
From (1-11) 


Pr (a): e Prü] + Pre el 
pa pem r = =- —- = — 
‘ 6 6 3 
and this can be interpreted as the probability of throwing either a 1 or a 3. A 
somewhat more complex situation arises when A = (1, 3}, B = (3, 5} and it is 
desired to determine Pr (A U B). Since A and B are not mutually exclusive, the 


result of (1—16) must be used. From the calculation above, it is clear that Pr (A) 


l 
= Pr (B) = 3 However, since A N B = {3}, an elementary event, it must be 


| 
that Pr (A M B) = 6 Hence, from (1-16) 


l l 
+---=- 


Pr (A U B) 3 *3"72"3 


Pr (A) + Pr (B) — Pr (A N B) = 


An alternative approach is to note that A U B = {1, 3, 5} which is composed of 
three mutually exclusive elementary events. Using (1—11) twice leads immediately 
to 


Pr (A U B) = Pr {1} + Pr {3} + Pr {5} = 2 +- + l 


| 
6 6 6 2 
Note that this can be interpreted as the probability of either A occurring or B 
occurring or both occurring. 





Exercise 1—6.1 


A dodecahedron is a solid with twelve sides and is often used to dis- 
play the twelve months of the year. When this object is rolled, let the 
outcome be taken as the month appearing on the upper face. Also let 
A = {January}, B = {Any month with 30 days}, and C = {Any month 
with 31 days}. Find 


a) Pr(A UC) 
b) Pr(An C) 
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c) Pr(BU C) 
d) Pr(An B). 


Answers: 11/12, 0, 1/12, 7/12 


Exercise 1—6.2 


Draw a Venn diagram showing three subsets that are not mutually 
exclusive. Using this diagram derive an expression for Pr (A U B U C). 


Answer: Pr (A) + Pr(B) + Pr (C) - Pr (A N B) - Pr (A N C) - 
Pr (B N C) + Pr(an Bn C) 





1—7 Conditional Probability 


The concept of conditional probability was introduced in Section 1—3 on the basis 
of the relative frequency of one event when another event is specified to have 
occurred. In the axiomatic approach, conditional probability is a defined quantity. 
If an event B is assumed to have a nonzero probability, then the conditional prob- 
ability of an event A, given B, is defined as 

| Pr (A N B) ! 

Pr (A | B) = Pr B Pr (B) > 0 (1-17) 
where Pr (A f1 B) is the probability of the event A M B. In the previous discus- 
sion, the numerator of (1—17) was written as Pr (A,B) and was called the joint 
probability of events A and B. This interpretation is still correct if A and B are 
elementary events, but in the more general case the proper interpretation must be 
based on the set theory concept of the product, A N B, of two sets. Obviously, if 
A and B are mutually exclusive, then A N B is the empty set and Pr (A N B) = 
0. On the other hand, if A is contained in B (that is, A C B), then ANB — A 
and 

Pr (A) 


Pr (A | B) = Pru ee, 


Finally, if B C A, then A N B = B and 


Pr (B) — 


Pea |B) = = ae 
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In general, however, when neither A C B nor B C A, nothing can be asserted 
regarding the relative magnitudes of Pr (A) and Pr (A | B). 

So far it has not yet been shown that conditional probabilities are really proba- 
bilities in the sense that they satisfy the basic axioms. In the relative-frequency 
approach they are clearly probabilities in that they could be defined as ratios of 
the numbers of favorable occurrences to the total number of trials, but in the 
axiomatic approach conditional probabilities are defined quantities; hence, it is 
necessary to verify independently their validity as probabilities. 

The first axiom is 


Pr (A | B) =0 


and this is obviously true from the definition (1—17) since both numerator and 
denominator are positive numbers. The second axiom is 


Pr (S | B) = 1 


and this is also apparent since B C S$ so that 5 N B = B and Pr (S N B) = Pr 
(B). In order to verify that the third axiom holds, consider another event, C, such 
that A N C = © (that is, A and C are mutually exclusive). Then 


Pr KA U C) A B] = Pr[(A N B) U (C N B)] = Pr (A n B) + Pr (C n B) 


since (A (1 B) and (C (1 B) are also mutually exclusive events and (1—11) holds 
for such events. So, from (1—17) 
| | Pr((AU C) ' BJ]  Pr(An B) , Pr(C NB) 
Pr [A U ©) | B] = Pr (B) |». Pr (B) i Pr (B) 
= Pr (A | B) + Pr(C | B) 


Thus the third axiom does hold, and it is now clear that conditional probabilities 
are valid probabilities in every sense. 

Before extending the topic of conditional probabilities, it is desirable to con- 
sider an example in which the events are not elementary events. Let the experi- 
ment be the throwing of a single die and let the outcomes be the integers from 1 
to 6. Then define event A as A = {1,2}, that is, the occurrence of a 1 or a 2. 
From previous considerations it is clear that Pr (A) = % + V6 = '^. Define B 
as the event of obtaining an even number. That is, B = {2,4,6} and Pr (B) = % 
since it is composed of three elementary events. The event A AN B is A MB = 
(2), from which Pr (A N B) = Vs. The conditional probability, Pr (A | B), is now 
given by 


T 
> 
4 
3 
I 
aI 


PEALE) = — = - 


to | — 
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This indicates that the conditional probability of throwing a 1 or a 2, given that 
the outcome is even, is '^. 

On the other hand, suppose it is desired to find the conditional probability of 
throwing an even number given that the outcome was a | or a 2. This is 


Y 
> 
e 
3 
Os | — 


U | — 


a result that is intuitively correct. 

One of the uses of conditional probability is in the evaluation of total probabil- 
ity. Suppose there are n mutually exclusive events Aj, A>, . . . , A, and an 
arbitrary event B as shown in the Venn diagram of Figure 1—7. The events A; 
occupy the entire space, S, so that 


AQUAU-:*:*UA,- (1-18) 


Since A; and A; (i # j) are mutually exclusive, it follows that B N A; and B N A; 
are also mutually exclusive. Further, 


B= BO(A,UA;U-:-:-UAQ = (B'nAJU(BOA3U-:::U(BÓ A) 
because of (1-18). Hence, from (1-11), 

Pr (B) = Pr (B N A) + Pr(BNM Az) ^ +++: - Pr(BNA,) (1-19) 
But from (1—17) 


Pr (B N A) = Pr (B | A) Pr (A) 





Figure 1-7. Venn diagram for total probability. 
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Table 1-3. Resistance Values. 


Ohms . Bin Numbers. 
| 2 3 4 5 6 Total 
100 | 500 0 200 800 1200 . 1000 3700 
100 Q 300 400 600 — 200 800 0 2300 
1000 N 200 600 200 600 0 1000 — 2600 
Total 1000 1000 1000 1600 2000 2000 8600 
Substituting into (1—19) yields 
Pr (B) = Pr (B | Aj) Pr (Aj) + Pr (B | A2) Pr (A2) 
T Pr(B|A,) Pr(A,) (1-20) 


The quantity Pr (B) is the total probability and is expressed in (1—20) in terms of 
its various conditional probabilities. 

An example serves to illustrate an application of total probability. Consider a 
resistor carrousel containing six bins. Each bin contains an assortment of resistors 
as shown in Table 1—3. If one of the bins is selected at random, ' and a single 
resistor drawn from that bin at random, what is the probability that the resistor 
chosen will be 10 Q? The A; events in (1—20) can be associated with the bin 
chosen so that 


l 
Pr (A) = c. i= 1, 2,3, 4, 5,6 
since it is assumed that the choices of bins are equally likely. The event B is the 


selection of a 10-{2 resistor and the conditional probabilities can be related to the 
numbers of such resistors in each bin. Thus 


Hence, from (1—20) the total probability of selecting a 10-2 resistor is 


"The phrase **at random" is usually interpreted to mean **with equal probability.” 


500 Í 00 | 
Pr (B | A) = x = Pr (B | A>) = pg 7:0 
| 200 2 . 800 | 
Pr (B | A) = — = — Pr (B Lr ees Heer tan 
r (B | As) 1000 10 r (B | Ay) 1600 2 
1200 6 i 1000 l 
r ( | 5) 2000 10 Pr (B | A6) 2000 5 
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l l l 
-X>+ Ox = 
2 6 6 
0.3833 


+ X 


ll 


| 2 1 1! 1! 6 
Pr (B eR x ps y tay 
i 6° 6° 2 61 


It is worth noting that the concepts of equally likely events and relative frequency 
have been used in assigning values to the conditional probabilities above, but that 
the basic relationships expressed by (1—20) is derived from the axiomatic ap- 
proach. , 

The probabilities Pr (A;) in (1—20) are often referred to as a priori probabilities 
because they are the ones that describe the probabilities of the events A; before 
any experiment is performed. After an experiment is performed, and event B ob- 
served, the probabilities that describe the events A; are the conditional probabili- 
ties Pr (A; | B). These probabilities may be expressed in terms of those already 
discussed by rewriting (1—17) as 


Pr (A; N B) = Pr (A; | B) Pr (B) = Pr (B | A) Pr (A) 


The last form in the above is obtained by simply interchanging the roles of B and 
A;. The second equality may now be written 


Pr (B | A) Pr (A) 


Pr (A; | B) = Pr (B) 


Pr (B) + 0 (1-21) 


into which (1—20) may be substituted to yield 


Pr (B | A) Pr (4) 


Pr (4: |B) = 5p Ap rap +--+ + Pr@|ADPrUD 


(1-22) 
The conditional probability Pr (A; | B) is often called the a posteriori probability 
because it applies after the experiment is performed; and either (1—21) or (1—22) 
is referred to as Bayes' theorem. 

The a posteriori probability may be illustrated by continuing the example just 
discussed. Suppose the resistor that is chosen from the carrousel is found to be a 
10- resistor. What is the probability that it came from bin three? Since B is still 
the event of selecting a 10-42 resistor, the conditional probabilities Pr (B | A) are 


the same as tabulated before. Furthermore, the a priori probabilities are still 3 
Thus, from (1—21), and the previous evaluation of Pr (5), 
Gi) 
10/ 46 
Pr (A; | B) = 03833. 0.0869 


This is the probability that the 10-£2 resistor, chosen at random, came from bin 
three. 
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Exercise 1—7.1 


Using the data of Table 1—3, find the probabilities: 
a) a 1000-0 resistor that is selected came from bin 3. 
b) a 10-0 resistor that is selected came from bin 5. 


Answers: 0.2609, 0.1067 


Exercise 1-7.2 


A manufacturer of electronic equipment purchases 1000 ICs from sup- 
plier A, 2000 ICs from supplier B, and 3000 ICs from supplier C. Test- 
ing reveals that the conditional probability of an IC failing during burn- 
in is, for devices from each of the suppliers 


Pr (F | A) = 0.1, Pr (F | B) = 0.05, Pr (F | C) = 0.08 


The ICs from all suppliers are mixed together and one device is se- 
lected at random. 


a) What is the probability that it will fail during burn-in? 
b) Given that the device fails, what is the probability that the de- 
vice came from supplier A? 


Answers: 0.0733, 0.2273 





1—8 Independence 


The concept of statistical independence is a very important one in probability. It 
was introduced in connection with the relative-frequency approach by considering 
two trials of an experiment, such as tossing a coin, in which it is clear that the 
second trial cannot depend upon the outcome of the first trial in any way. Now 
that a more general formulation of events is available, this concept can be ex- 
tended. The basic definition is unchanged, however. It is 


Two events, A and B, are independent if and only if 
Pr (A N B) = Pr (A) Pr (B) 


(1-23) 
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In many physical situations, independence of events is assumed because there 
is no apparent physical mechanism by which one event can depend upon the other. 
In other cases, the assumed probabilities of the elementary events lead to indepen- 
dence of other events defined from these. In such cases, independence may not be 
obvious, but can be established from (1—23). 

The concept of independence can also be extended to more than two events. 
For example, with three events, the conditions for independence are 


Pr (A; N A>) = Pr (A4) Pr (A>) Pr (A; N A3) = Pr (A1) Pr (A3) 
Pr (A> N A3) = Pr (A2) Pr (A3) Pr (A; N A? N A3) = Pr (A4) Pr (A2) Pr (A3) 


Note that four conditions must be satisfied, and that pairwise independence is not 
sufficient for the entire set of events to be mutually independent. In general, if 
there are n events, it is necessary that 


Pr (A; N A; M+ + - (0 Aj) = Pr (A) Pr (Ap * > * Pr (A) (1-24) 


for every set of integers less than or equal to n. This implies that 2” — (n + 1) 
equations of the form (1—24) are required to establish the independence of n 
events. 

One important consequence of independence is a special form of (1—16), which 
stated 


Pr (A U B) = Pr (A) + Pr (B) — Pr (A AN B) 
If A and B are independent events, this becomes 
Pr (A U B) = Pr (A) + Pr (B) — Pr (A) Pr (B) (1-25) 
Another result of independence is 
Pr [A; M (A2 U A3)] = Pr (Aj) Pr (A2 U A3) (1-26) 


if A}, A>, and A; are all independent. This is not true if they are only independent 
in pairs. In general, if Aj, A2, . . . , A, are independent events, then any one of 
them is independent of any event formed by sums, products, and complements of 
the others. 

Examples of physical situations that illustrate independence are most often as- 
sociated with two or more trials of an experiment. However, for purposes of 
illustration, consider two events associated with a single experiment. Let the ex- 
periment be that of rolling a pair of dice and define event A as that of obtaining a 
7 and event B as that of obtaining an 11. Are these events independent? The 
answer is that they cannot be independent because they are mutually exclusive— 
if one occurs the other one cannot. Mutually exclusive events can never be statis- 
tically independent. 

As a second example consider two events that are not mutually exclusive. For 
the pair of dice above, define event A as that of obtaining an odd number and 
event B as that of obtaining an 11. The event A N B is just B since B is a subset 
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of A. Hence, the Pr (A O B) = Pr (B) = Pr (11) = 2/36 = 1/18 since there are 
two ways an 11 can be obtained (that is, a 5 and a 6 or a 6 and a 5). Also the Pr 
(A) = 1/2 since half of all outcomes are odd. It follows then that 


Pr (A M B) = 1/18 # Pr (A)Pr (B) = (1/2) - (1/18) = 1/36 


Thus, events A and B are not statistically independent. That this must be the case is 
obvious since if B occurs then A must also occur, although the converse is not true. 

It is also possible to define events associated with a single trial that are inde- 
pendent, but these sets may not represent any physical situation. For example, 
consider throwing a single die and define two events as A = 11, 2, 3} and B = 
(3, 4}. From previous results it is clear that Pr (A) = 4 and Pr (B) = !A^. The 
event (A N B) contains a single element {3}; hence, Pr (A N B) = Vs. Thus, it 
follows that 


l 
Pr (A N B) = — = Pr (A) Pr (B) = ---=- 
| ' ó i | 2 3 6 
and events A and B are independent, although the physical significance of this is 
not intuitively clear. The next section considers situations in which there is more 
than one experiment, or more than one trial of a given experiment, and that dis- 
cussion will help clarify the matter. 





Exercise 1—8.1 
A card is selected at random from a standard deck of 52 cards. Let A be 
the event of selecting an ace, and let B be the event of selecting a spade. 
Are these events statistically independent? Prove your answer. 
Answer: Yes 


Exercise 1—8.2 


In the switching circuit shown below, the switches are assumed to 
operate randomly and independently. 


C — 
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If each switch has a probability of 0.2 of being closed, find the proba- 
bility that there is a complete path through the circuit. 


Answer: 0.0464 





1—9 Combined Experiments 


In the discussion of probability presented thus far, the probability space, S, was 
associated with a single experiment. This concept is too restrictive to deal with 
many realistic situations, so it is necessary to generalize it somewhat. Consider a 
situation in which two experiments are performed. For example, one experiment 
might be throwing a die and the other one tossing a coin. It is then desired to find 
the probability that the outcome is, say, a ''3" on the die and a "tail" on the 
coin. In other situations the second experiment might be simply a repeated trial of 
the first experiment. The two experiments, taken together, form a combined ex- 
periment, and it is now necessary to find the appropriate probability space for it. 

Let one experiment have a space 5, and the other experiment a space $. Des- 
ignate the elements of S, as 


$i = {Q),Q@2, ‘er = y an} 
and those of S$, as 


$5 = 181. Bo, . zs Bal 


Then form a new space, called the cartesian product space, whose elements are 


all the ordered pairs (o, 61), (a4, B3). . . .. (0, B), . . . , (o, Bm). Thus, if 
S, has n elements and S, has m elements, the cartesian product space has mn 


elements. The cartesian product space may be denoted as 


to distinguish it from the previous product or intersection discussed in Section 
1-5. 

As an illustration of the cartesian product space for combined experiments, 
consider the die and the coin discussed above. For the die the space is 


Sı = 112; 3, 4, 5, 6} 
while for the coin it is 
S, = {H, T} 


Thus, the cartesian product space has 12 elements and is 
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S= 8,55 = (CL, FD, OL, T), (2, H), (2, T), 8, H), (3, T), (4, H), 
(4, T), (5, H), (5, T), (6, H), (6, T)} 


It is now necessary to define the events of the new probability space. If A, is a 
subset considered to be an event in 5,, and A; is a subset considered to be an 
event in $5, then A = A, X A; is an event in S. For example, in the above 
illustration let A, = {1,3,5} and A; = {H}. The event A corresponding to these is 


A = A, X A, = {(1, H), (3, H), (5, H)j 


In order to specify the probability of event A, it is necessary to consider whether 
the two experiments are independent; the only cases discussed here are those in 
which they are independent. In such cases the probability in the product space is 
simply the products of the probabilities in the original spaces. Thus, if Pr (Aj) is 
the probability of event A, in space 5,, and Pr (A) is the probability of A; in 
space 5S5, then the probability of event A in space 5 is 

Pr (A) = Pr (A; X A2) = Pr (Aj) Pr (A5) (1-27) 

This result may be illustrated by data from the above example. From previous 


results, Pr (Aj) = % + V6 + V6 = V^ when A, = {1,3,5} and Pr (A) = '^ 
when A; = {H}. Thus, the probability of getting an odd number on the die and a 


head on the coin is 
| 1\/1 l 
PEMD = (G) (i) (4 


It is possible to generalize the above ideas in a straightforward manner to situ- 
ations in which there are more than two experiments. However, this will be done 
only for the more specialized situation of repeating the same experiment an arbi- 
trary number of times. 





Exercise 1—9.1 
A combined experiment is performed by rolling a die with sides num- 
bered from 1 to 6 and a child's block with sides labeled A through F. 
a) Write all of the elements in the cartesian product space. 


b) Define K as the event of obtaining an even number on the die 
and a letter of B or C on the block and find the probability of 
the event K. 


Answer: 1/6 





32 CHAPTER 1 INTRODUCTION TO PROBABILITY 


Exercise 1—9.2 


A combined experiment is performed by flipping a coin three times. 


a) Write all of the elements in the product space by indicating 
them as HHH, HTH, TTH, etc. 


b) Find the probability of obtaining exactly two heads. 
C) Find the probability of obtaining more than one head. 


Answers: 1/2, 3/8 





1—10 Bernoulli Trials 


The situation considered here is one in which the same experiment is repeated n 
times and it is desired to find the probability that a particular event occurs exactly 
k of these times. For example, what is the probability that exactly four heads will 
be observed when a coin is tossed ten times? Such repeated experiments are re- 
ferred to as Bernoulli trials. 

Consider some experiment for which the event A has a probability Pr (A) = p. 
Hence, the probability that the event does not occur is Pr (A) = q, where p + 
q = 1.° Then repeat this experiment n times and assume that the trials are inde- 
pendent; that is, that the outcome of any one trial does not depend in any way 
upon the outcomes of any previous (or future) trials. Next determine the probabil- 
ity that event A occurs exactly k times in some specific order, say in the first k 
trials and none thereafter. Because the trials are independent, the probability of 
this event is 


Pr (A) Pr (A) < - - Pr (A) Pr (A) Pr (A) 8. Pr (A) = p'q'^* 
—————MM—— 
k of these n — k of these 


However, there are many other ways in which exactly k events could occur be- 
cause they can arise in any order. Furthermore, because of the independence, all 
of these other orders have exactly the same probability as the one specified above. 
Hence, the event that A occurs k times in any order is the sum of the mutually 
exclusive events that A occurs k times in some specific order, and thus, the prob- 
ability that A occurs k times is simply the above probability for a particular order 
multiplied by the number of different orders that can occur. 


"The only justification for changing the notation from Pr (A) to p and from Pr (A) to q is that the p 
and q notation is traditional in discussing Bernoulli trials and most of the literature uses it. 
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It is necessary to digress at this point and briefly discuss the theory of combi- 
nations in order to be able to determine the number of different orders in which 
the event A can occur exactly k times in n trials. It is apparent that when one 
forms a sequence of length n, the first A can go in any one of the n places, the 
second A can go into any one of the remaining n-l places, and so on, leaving 
n — k + 1 places for the kth A. Thus, the total number of different sequences of 
length n containing exactly k As is simply the product of these various possibili- 
ties. Thus, since the k! orders of the k event places are identical 

n! 


[. E ! bs 
Hu [Inn -Dn-—-2)...(t—k-t DO] = Hn — b — bi (1-28) 


The quantity on the right is simply the binomial coefficient, which is usually de- 


- © The latter notation is employed here. 


As an example of binomial coefficients, let n = 4 and k = 2. Then 


n\ 4! 
kj 20! 
and there are six different sequences in which the event A occurs exactly twice. 


These can be enumerated easily as 


AAAA, AAAA, AAAA, AAAA, AAAA, AAAA 


noted either as ,,C, or as ( 


It is now possible to write the desired probability of A occurring k times as 


pa(k) = Pr (A occurs k times} = (tra (1-29) 


As an illustration of a possible application of this result, consider a digital 
computer in which the binary digits (0 or 1) are organized into **words'' of 32 
digits each. If there is a probability of 10 ? that any one binary digit is incorrectly 
read, what is the probability that there is one error in an entire word? For this 
case, n = 32, k = 1, and p = 10 ?. Hence, 


Pr {one error in a word} = ps1) = (?) (10 7) (0.999)?! 


32(0.999)?! (10?) = 0.031 


It is also possible to use (1—29) to find the probability that there will be no error 


in a word. For this, k = 0 and i) = 1. Thus, 


*A table of binomial coefficients is given in Appendix C. 


34 CHAPTER 1 INTRODUCTION TO PROBABILITY 


Pr {no error in a word} = p4(0) = Bh (1073)(0.999)*? 
= (0.999)* = 0.9685 


There are many other practical applications of Bernoulli trials. For example, if a 
system has n components and there is a probability p that any one of them will 
fail, the probability that one and only one component will fail is 


Pr {one failure} = p,(1) = (") pq” » 


In some cases, one may be interested in determining the probability that event 
A occurs at least k times, or the probability that it occurs no more than K times. 
These probabilities may be obtained by simply adding the probabilities of all the 
outcomes that are included in the desired event. For example, if a coin is tossed 
four times, what is the probability of obtaining at least two heads? For this case, 
p =q = V5 and n = 4. From (1-29) the probability of getting two heads (that 


is, kK = 2) is 
2 2 
2d LEE x 1)(1) _ 3 
rae) = (3) (3) () - (Q0) = 3 


Similarly, the probability of three heads is 


ro - (909) - (0G) - 


and the probability of four heads is 


n= OOO = o - 2 


Hence, the probability of getting at least two heads is 
- 3 1 l 11 
= | E. =- Rom 
Pr {at least two heads} = p4(2) + p4(3) + pa(4) gt T 
The general formulation of problems of this kind can be expressed quite easily, 
but there are several different situations that arise. These may be tabulated as 


follows: 
k-1 


Pr {A occurs less than k times in n trials} = Y p,(i) 
i=0 


Pr {A occurs more than k times in n trials} =  p,(i) 
i=kK+1 


Il 


Pr {A occurs no more than k times in 7 trials} p.) 


k 
Pr (A occurs at least k times in n trials} = i Pli) 
i=k 
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A final comment in regard to Bernoulli trials has to do with evaluating p,(k) when 
n is large. Since the binomial coefficients and the large powers of p and g become 
difficult to evaluate in such cases, often it is necessary to seek simpler, but ap- 
proximate, ways of carrying out the calculation. One such approximation, known 
as the DeMoivre-Laplace theorem, is useful if npg >> 1 and if | k — np | is on 
the order of or less than V/npq. This approximation is 


Hk n-k l 
„(k = = —— 
Prlk) (1) Pq ea 


The DeMoivre-Laplace theorem has additional significance when continuous prob- 
ability is considered in a subsequent chapter. However, a simple illustration of its 
utility in discrete probability is worthwhile. Suppose a coin is tossed 100 times 
and it is desired to find the probability of k heads, where k is in the vicinity of 
50. Since p = q = V» and n = 100, (1-30) yields 


—(k — apia 
e (k — np)-/2npq (1-30) 





p,(k) = l - (k — 50)2/50 
HH f 


507 * 
for k values ranging (roughly) from 40 to 60. This is obviously much easier to 


evaluate than trying to find the binomial coefficient m for the same range of 
k values. 





Exercise 1—10.1 


A pair of dice are tossed 8 times. 
a) Find the probability that a 7 will occur exactly 4 times. 
b) Find the probability that an 11 will occur 2 times. 
c) Find the probability that a 12 will occur more than once. 
Hint: Subtract the probability of a 12 occurring once or not at 
all from 1.0. 


Answers: 0.0613, 0.0193, 0.02605 


Exercise 1—10.2 


A file containing 8000 characters is to be transferred from one com- 
puter to another. The probability of any one character being trans- 
ferred in error is 0.001. 
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a) 


b) 


c) 


Find the probability that the file can be transferred without any 
errors. 


Using the DeMoivre-Laplace theorem, find the probability that 
there will be exactly 10 errors in the transferred file. 


What must the probability of error in transferring one character 
be in order to make the probability of transferring the entire file 
without error as large as .99? 


Answers: 3.341 x 10 ^, 0.1099, 1.256 x 107° 





PROBLEMS 


Note that the first two digits of each problem number correspond to the section 
number in which the appropriate material is discussed. 


l 1.1 


1-2.1 


A 6-cell storage battery having a nominal terminal voltage of 12 V is 
connected in series with an ammeter and a resistor labeled 6 22. 


a) List as many random quantities as you can for this circuit. 


b) If the battery voltage can have any value between 10.5 and 12.5, 
the resistor can have any value within 5% of its marked value, and 
the ammeter reads within 2% of the true current, find the range of 
possible ammeter readings. Neglect ammeter resistance. 


c) List any nonrandom quantities you can for this circuit. 
In determining the probability characteristics of printed English, it is 


common to consider a 27-letter alphabet in which the space between 
words is counted as a letter. Punctuation is usually ignored. 


a) Count the number of times each of the 27 letters appears in this 
problem. 


b) On the basis of this count, deduce the most probable letter, the next 
most probable letter, and the least probable letter (or letters). 


For each of the following random experiments, list all of the possible 
outcomes and state whether these outcomes are equally likely. 


1-2.2 


1-4.2 


PROBLEMS 37 


a) Flipping two coins. 


b) Observing the last digit of a telephone number selected at random 
from the directory. 

c) Observing the sum of the last two digits of a telephone number 
selected at random from the directory. 


State whether each of the following defined events is an elementary 
event or not. 

a) Obtaining a seven when a pair of dice are rolled. 

b) Obtaining two heads when three coins are flipped. 


c) Obtaining an ace when a card is selected at random from a deck of 
cards. 


d) Obtaining a two of spades when a card is selected at random from 
a deck of cards. 


e) Obtaining a two when a pair of dice are rolled. 

f) Obtaining three heads when three coins are flipped. 

g) Observing a value less than ten when a random voltage is observed. 

h) Observing the letter e sixteen times in a piece of text. 

If a die is rolled, determine the probability of each of the following 

events: 

a) Obtaining the number 5. 

b) Obtaining a number greater than 3. 

c) Obtaining an even number. 
If a pair of dice are rolled, determine the probability of each of the fol- 
lowing events: 

a) Obtaining a sum of 11. 

b) Obtaining a sum less than 5. 

c) Obtaining a sum that is an even number. 

A box of unmarked IC’s contains 200 hex inverters, 100 dual 4-input 


positive-AND gates, 50 dual J-K flip flops, 25 decade counters, and 25 
4-bit shift registers. 
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1—4.4 


a) If an IC is selected at random, what is the probability that it is a 
dual J-K flip flop? 


b) What is the probability that an IC selected at random is not a hex 
inverter? 


c) If the first IC selected is found to be a 4-bit shift register, what is 
the probability that the second IC selected will also be a 4-bit shift 
register? 


In the IC box of Problem 1—4.3 it is known that 10% of the hex invert- 
ers are bad, 15% of the dual 4-input positive-AND gates are bad, 18% 
of the dual J-K flip flops are bad, and 20% of the decade counters and 
4-bit shift registers are bad. 


a) If an IC is selected at random, what is the probability that it is both 
a decade counter and good? 


b) Ifan IC is selected at random and found to be a J-K flip flop, what 
is the probability that it is good? 


c) If an IC is selected at random and found to be good, what is the 
probability that it is a decade counter? 


A company manufactures small electric motors having horse power rat- 
ings of 0.1, 0.5 or 1.0 horsepower and designed for operation with 120 
V single-phase ac, 240 V single-phase ac or 240 V three-phase ac. The 
motor types can be distinguished only by their nameplates, A distributor 
has on hand 3000 motors in the quantities shown in the table below. 


Horsepower 120 V ac 240 V ac 240 V 30 
0.1 900 400 0 
0.5 200 500 100 


1.0 100 200 600 


One motor is discovered without a nameplate. For this motor determine 
the probability of each of the following events. 

a) The motor has a horsepower rating of 0.5 hp. 

b) The motor is designed for 240 V single-phase operation. 


c) The motor is 1.0 hp and is designed for 240 V three-phase opera- 
tion. 


d) The motor is 0.1 hp and is designed for 120 V operation. 


1-5.1 


1-5.2 


1-6.1 
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In Problem 1—4.5, assume that 10% of the motors labeled 120 V single- 
phase are mismarked and that 5% of the motors marked 240 V single- 
phase are mismarked. 


a) 


b) 


c) 


If a motor is selected at random, what is the probability that it is 
mismarked? 


If a motor is picked at random from those marked 240 V single- 
phase, what is the probability that it is mismarked? 


What is the probability that a motor selected at random is 0.5 hp 
and mismarked? 


Prove that a space 5 containing n elements has 2" subsets. Hint: Use the 
binomial expansion for (1 + x)". 


A space S is defined as 


S = {1, 3, 5, 7, 9, 11} 


and three subsets as 
A = tl, 3, 5}, B = {7, 9, 11},C = {1, 3, 9, 11} 
Find: 


AUB AB OC (BNC) 
BUC A AC 
AUC B C-A 
ANB C A-B 
ANC ANB (A-B) UB 
BNC ANB (A-B) UC 


Draw and label the Venn diagram for Problem 1—4.4. 


Using the algebra of sets show that the following relations are true. 


a) AU (Af1B) 
A U (Bf) C) - (AU B) 1 (AU C) 


b) 


c) AU(An B) 


d) 


A 


AUB 
(AN B)U(ANB)U(ANB)=A 


For the space and subspaces defined in Problem 1—5.2, assume that each 
element has a probability of 1/6. Find the following probabilities. 


a) 


d) 


Pr (A) b) Pr (B) c) Fr(c) 
Pr (A U B) e) Pr(A UC) D Pr[(A — cC) U B] 
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1—6.2 


1-7.1 


1-7.2 


A card is drawn at random from a standard deck of 52 cards. Let A be 
the event that a king is drawn, B the event that a spade is drawn, and C 
the event that a ten of spades is drawn. Describe each of the events listed 
below and calculate its probability. 


a AUB b) ANB c) AUB 
d AUC e BUC D ANC 
g BNC h (AnBUC i AnBnc 


An experiment consists of randomly drawing three cards in succession 
from a standard deck of 52 cards. Let A be the event of a king on the 
first draw, B the event of a king on the second draw, and C the event of 
a king on the third draw. Describe each of the events listed below and 
calculate its probability. 


a ANB b AUB c) AUB 
d ANBAC e (AnBU(BnCc) f AUBUC 


Prove that Pr (A U B) = 1 — Pr (A N B). 


Two solid-state diodes are connected in series. Each diode has a proba- 
bility of 0.05 that it will fail as a short circuit and a probability of O.1 
that it will fail as an open circuit. If the diodes are independent, what is 
the probability that the series connection of diodes will function as a 
diode? 


In a digital communication system, messages are encoded into the binary 
symbols 0 and |. Because of noise in the system, the incorrect symbol 
is sometimes received. Suppose that the probability of a 0 being trans- 
mitted is 0.4 and the probability of a 1 being transmitted is 0.6. Further 
suppose that the probability of a transmitted 0 being received as a | is 
0.08 and the probability of a transmitted 1 being received as a O is 0.05. 
Find: 


a) The probability that a received O was transmitted as a 0. 


b) The probability that a received | was transmitted as a 1. 


c) The probability that any symbol is received in error. 


A certain typist sometimes makes mistakes by hitting a key to the right 
or left of the intended key, each with a probability of 0.02. The letters 
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E, R, and T are adjacent to one another on the standard QWERTY key- 
board, and in English they occur with probabilities of Pr (E) — 0.1031, 
Pr (R) = 0.0484 and Pr (T) = 0.0796. 


a) What is the probability with which the letter R appears in text typed 
by this typist? 


b) What is the probability that a letter R appearing in text typed by this 
typist will be in error? 


1-7.3 A candy machine has ten buttons of which one never works, two work 
one-half the time, and the rest work all the time. A coin is inserted and 
a button is pushed at random. 


a) What is the probability that no candy is received? 


b) If no candy is received, what is the probability that the button that 
never works was the one pushed? 


c) If candy is received, what is the probability that one of the buttons 
that work one-half the time was the one pushed? 


1—7.4 A fair coin is tossed. If it comes up heads, a single die is rolled. If it 
comes up tails, two dice are rolled. Given that the outcome of the dice 
is 3, but you do not know whether one or two dice were rolled, what is 
the probability that the coin came up heads? 


1-7.5 A communication network has five links as shown below. 





The probability that each link is working 1s 0.9. What is the probability 
of being able to transmit a message from point A to point B? 
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1-7.6 


1-7.7 


1-7.8 


1-8.1 


A manufacturer buys components in equal amounts from three different 
suppliers. The probability that components from supplier A are bad is 
0.1, that components from supplier B are bad is 0.15, and that compo- 
nents from supplier C are bad is 0.05. Find: 


a) The probability that a component selected at random will be bad. 
b) If a component is found to be bad, what is the probability that it 


came from supplier B? 


An electronics hobbyist has three electronic parts cabinets with two 
drawers each. One cabinet has NPN transistors in each drawer, while a 
second cabinet has PNP transistors in each drawer. The third cabinet has 
NPN transistors in one drawer and PNP transistors in the other drawer. 
The hobbyist selects one cabinet at random and withdraws a transistor 
from one of the drawers. 


a) What is the probability that an NPN transistor will be selected? 


b) Given that the hobbyist selects an NPN transistor, what is the prob- 
ability that it came from the cabinet that contains both types? 


c) Given that an NPN transistor is selected what is the probability that 
it comes from the cabinet that contains only NPN transistors? 


If the Pr (A) > Pr (B), show that Pr (A | B) > Pr (B | A). 


When a pair of dice are rolled, let A be the event of obtaining a number 
of 6 or greater and let B be the event of obtaining a number of 6 or less. 
Are events A and B dependent or independent? 


If A, B, and C are independent events, prove that the following are also 
independent: 

a) AandBUC. 

b) Aand BANC. 

c) Aand B — C. 

A pair of dice are rolled. Let A be the event of obtaining an odd number 


on the first die and B be the event of obtaining and odd number on the 
second die. Let C be the event of obtaining an odd total from both dice. 


a) Show that A and B are independent, that A and C are independent, 
and that B and C are independent. 


b) Show that A, B, and C are not mutually independent. 


1-8.4 


1-9.1 


1-9.2 


1-10.1 


1-10.2 
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If A is independent of B, prove that: 


a) A is independent of B. 
b) A is independent of B. 
A combined experiment is performed in which two coins are flipped and 
a single die is rolled. The outcomes from flipping the coins are taken to 
be HH, TT, and HT (which is taken to be a single outcome regardless 


of which coin is heads and which one is tails). The outcomes from roll- 
ing the die are the integers from one to six. 


a) Write all of the elements in the cartesian product space. 
b) Let A be the event of obtaining two heads and a number of 3 or 
less. Find the probability of A. 


An electronic manufacturer uses four different types of ICs in manufac- 
turering a particular device. The NAND gates (designated as G if good 
and G if bad) have a probability of 0.05 of being bad. The flip flops (F 
and F) have a probability of 0.1 of being bad, the counters (C and C) 
have a probability of 0.03 of being bad, and the shift registers (S and 5) 
have a probability of 0.12 of being bad. 

a) Write all of the elements in the product space. 

b) Determine the probability that the manufactured device will work. 


c) If a particular device does not work, determine the probability that 
only the flip flops are bad. 


d) If a particular device does not work, determine the probability that 
both the flip flops and the counters are bad. 

Two men each flip a coin three times. 

a) What is the probability that both men will get exactly two heads 
each? 

b) What is the probability that one man will get no heads and the other 
man will get three heads? 


In playing an opponent of equal ability, which is more probable: 


a) To win 4 games out of 7, or to win 5 games out of 9? 


b) To win at least 4 games out of 7, or to win at least 5 games 
out of 9? 
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1-10.3 


1-10.4 


1-10.5 


1-10.6 


1-10.7 


1—-10.8 


Prove that „C, is equal to n-p), + (4 5C6 5. 


A football receiver, Harvey Gladiator, is able to catch two-thirds of the 
passes thrown to him. He must catch three passes for his team to win 
the game. The quarterback throws the ball to Harvey four times. 

a) Find the probability that Harvey will drop the ball all four times. 
b) Find the probability that Harvey will win the game. 

Out of a group of seven EEs and five MEs, a committee consisting of 
three EEs and two MEs is to be formed. In how many ways can this be 
done if: 

a) Any EE and any ME can be included? 

b) One particular EE must be on the committee? 

c) Two particular MEs cannot be on the committee? 

In the digital communication system of Problem 1-7.1, assume that the 
event of an error occurring in one binary symbol is statistically indepen- 
dent of the event of an error occurring in any other binary symbol. Find: 
a) The probability of receiving six successive symbols without error. 


b) The probability of receiving six successive symbols with exactly one 
error. 


c) The probability of receiving six successive symbols with more than 
one error. 
d) The probability of receiving six successive symbols with one or 


more CITOTS. 


A multichannel microwave link is to provide telephone communication 
to a remote community having 12 subscribers, each of whom uses the 
link 20 percent of the time during peak hours. How many channels are 
needed to make the link available during peak hours to: 

a) Eighty percent of the subscribers all of the time? 

b) All of the subscribers 80 percent of the time? 

c) All of the subscribers 95 percent of the time? 


A manufacturer of electronic equipment buys 1000 ICs for which the 
probability of any one IC being bad is 0.01. 
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a) What is the probability that exactly 10 of the ICs are bad? 
b) What is the probability that none of the ICs are bad? 
c) What is the probability that exactly 1 of the ICs is bad? 
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Random 
Variables 


2—1 Concept of a Random Variable 


The previous chapter deals exclusively with situations in which the number of 
possible outcomes associated with any experiment is finite. Although it is never 
stated that the outcomes had to be finite in number (because, in fact, they do nor), 
such an assumption is implied and is certainly true for such illustrative experi- 
ments as tossing coins, throwing dice, and selecting resistors from bins. There are 
many other experiments, however, in which the number of possible outcomes is 
not finite, and it is the purpose of this chapter to introduce ways of describing 
such experiments in accordance with the concepts of probability already estab- 
lished. 

A good way to introduce this type of situation is to consider again the experi- 
ment of selecting a resistor from a bin. When mention is made, in the previous 
chapter, of selecting a 142 resistor, or a 10-2 resistor, or any other value, the 
implied meaning is that the selected resistor is labeled **1 (0'* or ‘10 (2." The 
actual value of resistance is expected to be close to the labeled value, but might 
differ from it by some unknown (but measurable) amount. The deviations from 
the labeled value are due to manufacturing variations and can assume any value 
within some specified range. Since the actual value of resistance is unknown in 
advance, it is a random variable. 

To carry this illustration further, consider a bin of resistors that are all marked 
**100 N.” Because of manufacturing tolerances, each of the resistors in the bin 
will have a slightly different resistance value. Furthermore, there are an infinite 
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number of possible resistance values, so that the experiment of selecting one re- 
sistor has an infinite number of possible outcomes. Even if it is known that all of 
the resistance values lie between 9.99 (2 and 100.01 £2, there are an infinite num- 
ber of such values in this range. Thus, if one defines a particular event as the 
selection of a resistor with a resistance of exactly 100.00 (2, the probability of 
this event is actually zero. On the other hand, if one were to define an event as 
the selection of a resistor having a resistance between 99.9999 2 and 100.0001 
N, the probability of this event is nonzero. The actual value of resistance, how- 
ever, is a random variable that can assume any value in a specified range of 
values. 

It is also possible to associate random variables with time functions, and, in 
fact, most of the applications that are considered in this text are of this type. 
Although Chapter 3 will deal exclusively with such random variables and random 
time functions, it is worth digressing momentarily at this point, to note the rela- 
tionship between the two as it provides an important physical motivation for the 
present study. 

A typical random time function, shown in Figure 2-1, is designated as x(t). In 
a given physical situation, this particular time function is only one of an infinite 
number of time functions that might have occurred. The collection of all possible 
time functions that might have been observed belongs to a random process, which 
will be designated as {x(r)}. When the probability functions are also specified, this 
collection is referred to as an ensemble. Any particular member of the ensemble, 
say x(t), is a sample function, and the value of the sample function at some par- 
ticular time, say fı, is a random variable which we call X(¢,) or simply X,. Thus, 
X, = x(tj) when x(t) is the particular sample function observed. 

A random variable associated with a random process is a considerably more 
involved concept than the random variable associated with the resistor above. In 
the first place, there is a different random variable for each instant of time, al- 
though there usually is some relation between two random variables corresponding 
to two different time instants. In the second place, the randomness we are con- 
cerned with is the randomness that exists from sample function to sample function 
throughout the complete ensemble. There may also be randomness from time in- 


x(t) 





Figure 2-1 A random time function. 
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stant to time instant, but this is not an essential ingredient of a random process. 
Therefore, the probability description of the random variables being considered 
here is also the probability description of the random process. However, our initial 
discussion will concentrate on the random variables and will be extended later to 
the random process. 

From an engineering viewpoint, a random variable is simply a numerical de- 
scription of the outcome of a random experiment. Recall that the sample space 
S = {a} is the set of all possible outcomes of the experiment. When the outcome 
is a, the random variable X has a value that we might denote as X(a). From this 
viewpoint, a random variable is simply a real-valued function defined over the 
sample space—and in fact the fundamental definition of a random variable is sim- 
ply as such a function (with a few restrictions needed for mathematical consis- 
tency). For engineering applications, however, it is usually not necessary to con- 
sider explicitly the underlying sample space. It is generally only necessary to be 
able to assign probabilities to various events associated with the random variables 
of interest and these probabilities can often be inferred directly from the physical 
situation. What events are required for a complete description of the random var- 
iable, and how the appropriate probabilities can be inferred, form the subject mat- 
ter for the rest of this chapter. 

If a random variable can assume any value within a specified range (possibly 
infinite), then it will be designated as a continuous random variable. In the follow- 
ing discussion all random variables will be assumed to be continuous unless stated 
otherwise. It will be shown, however, that discrete random variables (that is, 
those assuming one of a countable set of values) can also be treated by exactly 
the same methods. 


2—2 Distribution Functions 


In order to consider continuous random variables within the framework of proba- 
bility concepts discussed in the last chapter, it is necessary to define the events to 
be associated with the probability space. There are many ways in which events 
might be defined, but the method to be described below is almost universally 
accepted. 

Let X be a random variable as defined above and x be any allowed value of this 
random variable. The probability distribution function is defined to be the proba- 
bility of the event that the observed random variable X is less than or equal to the 
allowed value x. That is,' 


Fy(x) = Pr (X = x) 


"The subscript X denotes the random variable while the argument x could equally well be any other 
symbol. In much of the subsequent discussion it is convenient to suppress the subscript X when no 
confusion will result. Thus Fy(x) will often be written F(x). 
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(a) | (b) 


Figure 2-2 Some possible probability distribution functions. 


Since the probability distribution function is a probability, it must satisfy the 
basic axioms and must have the same properties as the probabilities discussed in 
Chapter 1. However, it is also a function of x, the possible values of the random 
variable X, and as such must generally be defined for all values of x. Thus, the 
requirement that it be a probability imposes certain constraints upon the functional 
nature of Fy(x). These may be summarized as follows: 


l. 0 = Foix) 1 —00 «€ x « co 
2. Fx(-—%) = 0 Fy(%) = | 

3. Fy(x) is nondecreasing as x increases. 
4. Pr (x < X = x) = Fy) — Fx(xi) 


Some possible distribution functions are shown in Figure 2-2. The sketch in (a) 
indicates a continuous random variable having possible values ranging from — oc 
to ^ while (b) shows a continuous random variable for which the possible values 
lie between a and b. The sketch in (c) shows the probability distribution function 
for a discrete random variable that can assume only four possible values (that is, 
0, a, b, or c). In distribution functions of this type it is important to remember 
that the definition for Fy(x) includes the condition X = x as well as X < x. Thus, 
in Figure 2-2(c), it follows (for example) that Fy(a) — 0.4 and not 0.2. 

The probability distribution function can also be used to express the probability 
of the event that the observed random variable X is greater than (but not equal to) 
x. Since this event is simply the complement of the event having probability Fy(x) 
it follows that 


Pr (X > x) = 1 — Fy(x) 


As a specific illustration, consider the probability distribution function shown 
in Figure 2—3. Note that this function satisfies all of the requirements listed above. 
It is easy to see from the figure that the following statements (among many other 
possible statements) are true: 


Pr (X = —5) = 0.25 
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Figure 2-3 A specific probability distribution function. 


Pr (X > —5) = 1 — 0.25 = 0.75 
Pr (X > 8) = 1-09 = 0.1 
Pr(-5«X = 8) = 0.9 — 0.25 = 0.65 
Pr(X > 0) = 1-Pr(X¥ =0) = 1 -0.5 = 0.5 


In the example above, all of the variation of the probability distribution function 
takes place between finite limits. This is not always the case, however. Consider, 
for example, a probability distribution function defined by 


l 2 j 

Fy(x) = (i + Ž tan! 5) —00 « x « o (2-1) 
2 T 5 

and shown in Figure 2—4. Again, there are many different statements that can be 

made concerning the probability that the random variable X lies in certain regions. 

For example, it is straightforward to verify that all of the following are true: 





-20 - i0 0 . 10 20 


Figure 2-4 A probability distribution function with infinite range. 
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Pr(X = —5) = 0.25 

Pr (X > —3) = 1 = 0,25 = 0.75 

Pr (X > 8) = 1 — 0.8222 = 0.1778 

Pr(—53 < X = 8) = 0.8222 — 0.25 = 0.5722 
Pr (Xx > 0) = 1 — Pr(X = 0) = 0.5 





Exercise 2-2.1 
A random experiment consists of flipping six coins and taking the ran- 
dom variable to be the number of heads. 

a) Sketch the distribution function for this random variable. 


b) What is the probability that the random variable is less than 
3.5? 


c) What is the probability that the random variable is greater than 
2.5? 


d) What is the probability that the random variable is greater than 
1.5 and less than or equal to 5.0? 


Answers: 0.6563, 0.875, 0.6563 


Exercise 2-2.2 
A particular random variable has a probability distribution function 
given by 
Fy(x) = 0 -0 <y = 0 


E. ÜÓ x «o 


Il 
— 
| 
(e 


Find 
a) the probability that X > 0.5 
b) the probability that X — 0.25 
c) the probability that 0.3 — X = 0.7. 


Answers: 0.2212, 0.6065, 0.2442 
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2—3 Density Functions 


Although the distribution function is a complete description of the probability 
model for a single random variable, it is not the most convenient form for many 
calculations of interest. For these, it may be preferable to use the derivative of 
F(x) rather than F(x) itself. This derivative is called the probability density func- 
tion and, when it exists, it 1s defined by? 


———9 Fx(x rf. dF y (x) 
ke) = am a T 
The physical significance of the probability density function is best described in 
terms of the probability element, fy(x) dx. This may be interpreted as 


f(x) dx = Pr(x «€ X =x + dx) (2-2) 


Equation (2-2) simply states that the probability element, fy(x) dx, is the proba- 
bility of the event that the random variable X lies in the range of possible values 
between x and x + dx. 

Since fy(x) is a density function and not a probability, it is not necessary that 
its value be less than 1; it may have any nonnegative value.? Its general properties 
may be summarized as follows: 


lf) z0 —o«x«o 

2. INT dx — 1 

3. F(x) = [ftw du 
x2 

4. Í fx(x) dx = Pr (x; < X S x) 
x] 


As examples of probability density functions, those corresponding to the distri- 
bution functions of Figure 2-2 are shown in Figure 2-5. Note particularly that 
the density function for a discrete random variable consists of a set of delta func- 
tions, each having an area equal to the magnitude of the corresponding disconti- 
nuity in the distribution function. It is also possible to have density functions that 
contain both a continuous part and one or more delta functions. 

There are many different mathematical forms that might be probability density 


Again, the subscript denotes the random variable and when no confusion results, it may be omitted. 
Thus, f(x) will often be written as f(x). 


‘Because Fy(x) is nondecreasing as x increases. 
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0 a O b 

(a) (b) 
Figure 2-5 Probability density functions corresponding to the distribution functions 
of Figure 2-2. 





functions, but only a very few of these arise to any significant extent in the anal- 
ysis of engineering systems. Some of these are considered in subsequent sections 
and a table containing numerous density functions is given in Appendix B. 

Before considering the more important probability density functions, however, 
let us look at the density functions that are associated with the probability distri- 
bution functions described in the previous section. It is clear from Figure 2-3 that 
the probability density function associated with this random variable must be zero 
for x = —10 and x > 10. Furthermore, in the interval between — 10 and 10 it 
must have a constant value since the slope of the distribution function is constant. 
Thus: 


fx(x) = 0 x= —10 
= 0.05 -10<x=10 
= 0 x> 10 


This is sketched in Figure 2—6. 





—10 | 0 10 


Figure 2-6 Probability density function corresponding to the distribution function 
of Figure 2—3. 
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10 0 10 


Figure 2-7 Probability density function corresponding to the distribution function 
of Figure 2—4. 


The probability density function corresponding to the distribution function of 
Figure 2—4 can be obtained by differentiating the distribution function of (2-1). 
Thus, 


dy) djl, 1i. s,s] Sf 1 — 
Jx) dx dx | v" : T (> + x) ii (2-3) 


This probability density function is displayed in Figure 2-7. 

A situation that frequently occurs in the analysis of engineering systems is that 
in which one random variable is functionally related to another random variable 
whose probability density function is known and it is desired to determine the 
probability density function of the first random variable. For example, it may be 
desired to find the probability density function of a power variable when the prob- 
ability density function of the corresponding voltage or current variable is known. 
Or it may be desired to find the probability density function after some nonlinear 
operation is performed on a voltage or current. Although a complete discussion of 
this problem is not necessary here, a few elementary concepts can be presented 
and will be useful in subsequent discussions. 

In order to formulate the mathematical framework, let the random variable Y be 
a single-valued, real function of another random variable X. Thus, Y — g(X)*, in 
which it is assumed that the probability density function of X is known and is 
denoted by f(x), and it is desired to find the probability density function of Y, 
which is denoted by fy(y). If it is assumed for the moment that g(X) is a mono- 
tonically increasing function of X, then the situation shown in Figure 2—8(a) ap- 
plies. It is clear that whenever the random variable X lies between x and x + dx, 
the random variable Y will lie between y and y + dy. Since the probabilities of 
these events are fy (x) dx and fy(y) dy, one can immediately write 





“This also implies that the possible values of X and Y are related by y = g(x). 
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X x+dx 
(a) 





Figure 2-8 Transformation of variables. 


fr(y) dy = fx(x) dx 


from which the desired probability density function becomes 


dx 
fO) = fx) bh (2-4) 


Of course, in the right side of (2-4), x must be replaced by its corresponding 
function of y. 

When g(X) is a monotonically decreasing function of X, as shown in Figure 
2—8(b), a similar result is obtained except that the derivative is negative. Since 
probability density functions must be positive, and also from the geometry of the 
figure, it is clear that what is needed in (2—4) is simply the absolute value of the 
derivative. Hence, for either situation 

fy») = f) n (2-5) 
i 








In order to illustrate the transformation of variables, consider first the problem 
of scaling the amplitude of a random variable. Assume that we have a random 
variable X whose probability density function fy(x) is known. We then consider 
another random variable Y that is linearly related to X by Y — AX. This situation 
arises, for example, when X is the input to an amplifier and Y is its output. Since 
the possible values of X and Y are related in the same way, it follows that 


dy 
dx 


From (2—5) it is clear that the probability density function of Y is 


— X 
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Thus, it is very easy to find the probability density of any random variable that is 
simply a scaled, version of another random variable whose density function is 
known. 

Consider next a specific example of the transformation of random variables by 
assuming that the random variable X has a density function of the form 


fx) = e "*u(x) 


where u(x) is the unit step starting at x = 0. Now consider another random vari- 
able Y that is related to X by 
var 
Since y and x are related in the same way, it follows that 
ay 
dx 


3x7 


and 


dy 7 3x i ay 


Thus, the probability density function of Y is 


— ,I/3 
e Y 


3 





fy) = y 7? u(y) 

There may also be situations in which, for a given Y, g(X) has regions in which 
the derivative is positive and other regions in which it is negative. In such cases, 
the regions may be considered separately and the corresponding probability den- 
sities added. An example of this sort will serve to illustrate such a transformation. 

Let the functional relationship be 


Y-X 
This is shown in Figure 2—9 and represents, for example, the transformation (ex- 


cept for a scale factor) of a voltage random variable into a power random variable. 
Since the derivative, dx/dy, has an absolute value given by 


dx d. 

dy} 2vy 

and since there are two x-values for every y-value (x = - Vy), the desired prob- 
ability density function is simply 








l ges 
fry) = zl KAM +fx(-Vy)] y=0 (2-6) 


Furthermore, since y can never be negative, 
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x=- yY x — Vy 


Figure 2-9 The square law transformation. 


fy CY) = () y <— () 


Some other applications of random variable transformations are considered later. 





Exercise 2—3.1 
The probability density function of a random variable has the form 
f«(x) = Ke *"u(x), where u(x) is the unit step function. Find 

a) the value of K 

b) the probability that X > 1 

c) the probability that X = 0.5. 


Answers: 0.1353, 0.6321, 2.0 


Exercise 2—3.2 


A random variable Y is related to the random variable X of Exercise 
2—3.1 by 

Y = 6X + 3 
Find the probability density function of Y. 


Answer: (1/3 exp [—(y — 3)/6] 





2—4 Mean Values and Moments 


One of the most important and most fundamental concepts associated with statis- 
tical methods is that of finding average values of random variables or functions of 
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random variables. The concept of finding average values for time functions by 
integrating over some time interval, and then dividing by the length of the inter- 
val, is a familiar one to electrical engineers, since operations of this sort are used 
to find the dc component, the root-mean-square value, or the average power of 
the time function. Such time averages may also be important for random functions 
of time, but, of course, have no meaning when considering a single random var- 
iable, which is defined as the value of the time function at a single instant of time. 
Instead, it is necessary to find the average value by integrating over the range of 
possible values that the random variable may assume. Such an operation is re- 
ferred to as **ensemble averaging,” and the result is the mean value. 

Several different notations are in standard use for the mean value but the most 
common ones in engineering literature are? 


¥ 


X = E[X] = [. xf (x) dx (2-7) 


The symbol E[X] is usually read *'the expected value of X"' or **the mathematical 
expectation of X.” It is shown later, that in many cases of practical interest, the 
mean value of a random variable is equal to the time average of any sample 
function from the random process to which the random variable belongs. In such 
cases, finding the mean value of a random voltage or current is equivalent to 
finding its dc component; this interpretation will be employed here for illustration. 

The expected value of any function of x can also be obtained by a similar 
` calculation. Thus, 


ES 


EIO] = | _ «00 fla) ds (2-8) 


n 


A function of particular importance is g(x) = x’, since this leads to the general 


moments of the random variable. Thus, 
X" = E[x"] = [ x" f(x) dx (2-9) 


By far the most important moments of X are those given by n = 1, which is 
the mean value discussed above, and by n = 2, which leads to the mean-square 
value. 

X? = E[(X] = [ x*f(x) dx (2-10) 
The importance of the mean-square value lies in the fact that it may often be 
interpreted as being equal to the time average of the square of a random voltage 


‘Note that the subscript X has been omitted from f(x) since there is no doubt as to what the random 
variable is. 
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or current. In such cases, the mean-square value is proportional to the average 
power (in a resistor) and its square root is equal to the rms or effective value of 
the random voltage or current. 

It is also possible to define central moments;-which are simply the moments of 
the difference between a random variable and its mean value. Thus, the nth central 
moment is 


x 


(X -X"-Hx-2x"- [. (x — XVF) dx (2-11) 


The central moment for n = 1 is, of course, zero, while the central moment for 
n = 2 is so important that it carries a special name, the variance, and is usually 
symbolized by o^. Thus, 


o’ = (X -Xf = [. (x — X)*f(x) dx (2-12) 
The variance can also be expressed in an alternative form by using the rules for 
the expectations of sums; that is, 
E[X, + X; ---- + X,] = EIX + E[X;] +--+: + EX, ] 
Thus, 
? = E((X — Xy] = E[X? — 2XX + (X)’] 


EC] — 2EIXIX + XE _ (2-13) 
X^ — 2XX + (Xy = X^ - xy 


Q 
| 


and it is seen that the variance is the difference between the mean-square value 
and the square of the mean value. The square root of the variance, o, is known 
as the standard deviation. 

In electrical circuits, the variance can often be related to the average power (in 
a resistance) of the ac components of a voltage or current. The square root of the 
variance would be the value indicated by an ac voltmeter or ammeter of the rms 
type that does not respond to direct current (because of capacitive coupling, for 
example). 

In order to illustrate some of the above ideas concerning mean values and mo- 
ments, consider a random variable having a uniform probability density function 
as shown in Figure 2-10. A voltage waveform that would lead to such a proba- 
bility density function might be a sawtooth waveform that varied linearly between 
20 and 40 V. The appropriate mathematical representation for this density func- 
tion is 


f(x) —o < x= 20 


Il 
— = 


20 «€ x = 40 


= 0 40«x«o 
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0 10 20 30 40 50 


Figure 2-10 A uniform probability density function. 


The mean value of this random variable is obtained by using (2-7). Thus, 
a 
l 
= = — (1600 — = 
40 (1600 — 400) = 30 





This value is intuitively the average value of the sawtooth waveform just de- 
scribed. The mean-square value is obtained from (2—10) as: 
40 
" I à 
= — (64 — 8)10° = 933.3 
60 ( ) 





20 


The variance of the random variable can be obtained from either (2-12) or 
(2—13). From the latter, 
gd? = X? — (XY. = 933.3 — (30) = 33.3 
On the basis of the assumptions that will be made concerning random pro- 
cesses, if the sawtooth voltage were measured with a dc voltmeter, the reading 
would be 30 V. If it were measured with an rms-reading ac voltmeter (which did 
not respond to dc), the reading would be /33.3 V. . 


As a second illustration of the determination of the moments of a random vari- 
able, consider the probability density function 


f(x) = Kx[u(x) — u(x — 1)] 


The value of k can be determined from the Oth moment of f(x) since that is just 
the area of the density function and must be 1. Thus: 


k 
| bear = i-i wk =2 
| 2 
The mean and mean-square value of X may now be calculated readily as 


l 
X= Í x(2x) dx = 2/3 
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l 
x? = |l x'Qx) dx = 1/2 


From these two quantities the variance becomes 


5x ru [* FT 
g = — = = = = | = — 
2 3 18 


Likewise, the 4th moment of X is 


D l l 
xt = | x*(2x) dt = -= 
0 3 
and the 4th central moment is given by 
Iy 4 
= 2 l 
X- xy = | x — T| Qx) ad = — 
ey (: 3L VEM TC 





Exercise 2—4.1 


For the random variable of Exercise 2—3.1, find 
a) the mean value of X 
b) the mean-square value of X 
C) the variance of X. 


Answers: 1/4, 1/2, 1/2 


Exercise 2—4.2 


A random variable X has a probability density function of the form 
ROD = VA[u(x + 2) — u(x — 2)] 
For the random variable Y = X^, find 


a) the mean value 
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b) the mean-square value 


c) the variance. 


Answers: 4/3, 64/45, 16/5 





2—5 The Gaussian Random Variable 


Of the various density functions that we shall study, the most important by far is 
the Gaussian or normal density function. There are many reasons for its impor- 
tance, some of which are: 


l. It provides a good mathematical model for a great many different physi- 
cally observed random phenomena. Furthermore, the fact that it should be 
a good model can be justified theoretically in many cases. 

2. It is one of the few density functions that can be extended to handle an 
arbitrarily large number of random variables conveniently. 

3. Linear combinations of Gaussian random variables lead to new random 
variables that are also Gaussian. This is not true for most other density 
functions. 

4. The random process from which Gaussian random variables are derived can 
be completely specified, in a statistical sense, from a knowledge of all first 
and second moments only. This is not true for other processes. 

5. In system analysis, the Gaussian process is often the only one for which a 
complete statistical analysis can be carried through in either the linear or 
the nonlinear situation. 


The mathematical representation of the Gaussian density function is 


20 


p l -(x — Xy 
f(x) = T= exp EM — 00 « x « o (2-14) 


where X and o? are the mean and variance, respectively. The corresponding dis- 
tribution function cannot be written in closed form. The shapes of the density 
function and distribution function are shown in Figure 2-11. There are a number 
of points in connection with these curves that are worth noting. These are: 


l. There is only one maximum and it occurs at the mean value. 

2. The density function is symmetrical about the mean value. 

3. The width of the density function is directly proportional to the standard 
deviation, a. The width of 2e occurs at the points where the height is 
0.607 of the maximum value. These are also the points of maximum ab- 
solute slope. 
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Figure 2-11 The Gaussian random variable: (a) density function and (b) distribu- 
tion function. 


4. The maximum value of the density function is inversely proportional to the 
standard deviation o. Since the density function has an area of unity, it can 
be used as a representation of the impulse or delta function by letting o 
approach zero. That is 





c0 8 g | = XY 
d(x — X) = Bm US ap | TES | (2-15) 


This representation of the delta function has an advantage over some others 
of being infinitely differentiable. 


The Gaussian distribution function cannot be expressed in closed form in terms 
of elementary functions. It can, however, be expressed in terms of functions that 
are commonly tabulated. From the relation between density and distribution func- 
tions it follows that the general Gaussian distribution function is 


B x B 1 f 7 (u = xy 
F(x) = [fo du = Vino ine exp BC | du (2-16) 


The function that is usually tabulated is the distribution function for a Gaussian 
random variable that has a mean value of zero and a variance of unity (that is, 
X = 0, c = 1). This distribution function is often designated by ®(x) and is 
defined by 





| -[ E | 
P(x E. - “oe ( c) du (2-17) 


By means of a simple change of variable it is easy to show that the general 
Gaussian distribution function of (2-14) can be expressed in terms of ®(x) by 


66 CHAPTER 2 RANDOM VARIABLES 


F(x) = (=>) (2-18) 
o 


An abbreviated table of values for ®(x) is given in Appendix D. Since only pos- 
itive values of x are tabulated, it is frequently necessary to use the additional 
relationship 


Ó(—x)-21- D(x) (2-19) 


Another function that is closely related to ®(x), and is often more convenient 
to use, is the Q-function defined by 


1 [^ ur 
Q(x) = vas 1 exp (-5) du (2-20) 
and for which 
Q(-x) = 1 — Qx) (2-21) 
Upon comparing this with (2—17), it is clear that 
Q(x) = 1 — PQ) 


Likewise, comparing with (2-18) 


Fa) = i = o(: = 3 


o 





A brief table of values for Q(x) is given in Appendix E for small values of x. 
Several alternative notations are also used in the literature for both ®(x) and 
Q(x). Some authors use 


erf (x) = (x) (2-22) 
where erf(x) is called the error function and 
erfc(x) = Q(x) (2-23) 


where erfc(x) is called the complementary error function. Still other authors define 
the error function by 


erf(x) = 2 Í exp(— u°) du = 2 ®(vV/2x) — 1 (2-24) 


This diversity of notation emphasizes the need to carefully determine the defini- 
tions being used whenever the literature is read. 

Although both ®(x) and Q(x) are widely tabulated, there is an advantage in 
using Q(x) when tables are not available or the values needed are outside the range 
of tables. This is because there is a relatively simple way to calculate quite accu- 
rate values of Q(x) with an ordinary hand calculator. This computation procedure 
starts by representing Q(x) as 
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~ exp ( — xi) ! 
Q(x) vm G(x) (2-25) 
in which G(x) is a continued fraction defined by 
G(x) = l l (2-26) 
xt] : 
x -2 
x-F3 
x 4-4 
i "Eu oos 


The next step is to decide upon the number of terms to be used in evaluating G(x). 
Increased accuracy is achieved by using more terms, but, of course, the labor 
involved is also increased. A rule for selecting the number of terms will be given 
shortly, but for the moment, let us assume that we have selected an appropriate 
number and have called this number n. We then calculate the following sequence 
of values: 





= y +4 = 
Pn X " 
4 eo! 
Pa- = % 
; Pn 
l 
pi =x + — 
Pa 
l 
G(x) = — 
P1 


The last value in this sequence is the desired G(x). 

The number of terms required in the expansion for G(x) to achieve a desired 
accuracy depends upon the value of x for which the computation is being made; 
the smaller x is, the larger n must be. As a general rule of thumb, in order to 
achieve six significant figures for the final value of Q(x), the product of x and n 
should be at least 30. 

The Q-function is useful in calculating the probability of events that occur very 
rarely. An example will serve to illustrate this application as well the technique 
for evaluating the Q-function. Suppose we have an IC trigger circuit that is sup- 
posed to change state whenever the input voltage exceeds 2.5 volts; that is, when- 
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"em 


ever the input goes from a ''Ü'' state to a state. Assume that when the input 
is in the *'0" state the voltage is actually 0.5 volts, but that there is Gaussian 
random noise superimposed on this having a variance of 0.2 volts squared. Thus, 
the input to the trigger circuit can be modeled as a Gaussian random variable with 
a mean of 0.5 and a variance of 0.2. We wish to determine the probability that 
the circuit will incorrectly trigger as a result of the random input exceeding 2.5. 
From the definition of the Q-function, it follows that the desired probability is just 
Q[(2.5 — 0.5)/\/0.2] = Q(4.472). Although Q(4.472) is in the table in Appendix 
E, for purposes of illustration we will calculate it using the technique described 
above. From the rule of thumb, it appears that n should be 7. Thus, 


4.472 + " — 6.037 


p= 4.472 
- 4472 + — = 5.466 
MEUM 6037 ^ > 
js smi ATO + — = 5.387 
p, = 4.472 + = = 4.677 
G(4.472) = 0.2138 
Q(4.472) = xp A 00.2138) - 3.872 x 1075 
Ti 


Note that the probability of incorrectly triggering on any one operation is quite 
small. However, over a period of time in which many operations occur, the prob- 
ability can become significant. The probability that false triggering does not occur 
is simply 1 minus the probability that it does occur. Thus, in n operations, the 
probability that false triggering occurs is 


Pr (False Triggering) = 1 — (1 — 3.872 x 107°)" 
For n = 10°, this probability becomes 
Pr (False Triggering) = 0.321 


A conclusion that can be drawn from this example is that when there is apprecia- 

ble noise in a digital circuit, errors are almost certain to occur sooner or later. 
Although many of the most useful properties of Gaussian random variables will 

become apparent only when two or more variables are considered, one that can 
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be mentioned now is the ease with which high-order central moments can be 
determined. The mth central moment, which was defined in (2-11), can be ex- 
pressed for a Gaussian random variable as 


x — xy 


0 n odd (2-27) 
= 1-3-5--::(n — Do" neven 


As an example of the use of (2-27), if n = 4, the fourth central moment is 
(X — X)* = 3o*. A word of caution should be noted, however. The relation 
between the nth general moment, X". and the nth central moment is not always 
as simple as it is for n — 2. In the n — 4 Gaussian case, for example, 


X* = 3o* + 6o (XF + (XY 


Before leaving the subject of Gaussian density functions, it is interesting to 
compare the defining equation, (2-14), with the probability associated with Ber- 
noulli trials for the case of large n as approximated in (1—30). It will be noted 
that, except for the fact that k and n are integers, the DeMoivre-Laplace approxi- 
mation has the same form as a Gaussian density function with a mean value of np 
and a variance of npg. Since the Bernoulli probabilities are discrete, the exact 
density function for this case is a set of delta functions that increase in number as 
n increases, and as n becomes large the area of these delta functions follows a 
Gaussian law. 

Another important result closely related to this is the central limit theorem. This 
famous theorem concerns the sum of a large number of independent random var- 
iables having the same probability density function. In particular, let the random 
variables be X,, X5, . . . X, and assume that they all have the same mean value, 
m, and the same variance, o”. Then define a normalized sum as 


n 

TE” > (X: — m) (2-28) 

n k=1 

Under conditions that are weak enough to be realized by almost any random var- 
iable encountered in real life, the central limit theorem states that the probability 
density function for Y approaches a Gaussian density function as n becomes large 
regardless of the density function for the Xs. Furthermore, because of the nor- 
malization, the random variable Y will have zero mean and a variance of o^. The 
theorem is also true for more general conditions, but this is not the important 
aspect here. What is important is to recognize that a great many random phenom- 
ena that arise in physical situations result from the combined actions of many 
individual events. This is true for such things as: thermal agitation of electrons in 
a conductor, shot noise from electrons or holes in a vacuum tube or transistor, 
atmospheric noise, turbulence in a medium, ocean waves, and many other physi- 
cal sources of random disturbances. Hence, regardless of the probability density 
functions of the individual components (and these density functions are usually 
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not even known), one would expect to find that the observed disturbance has a 
Gaussian density function. The central limit theorem provides a theoretical justi- 
fication for assuming this, and, in almost all cases experimental measurements 
bear out the soundness of this assumption. 





Exercise 2—5.1 


A Gaussian random variable has a mean value of 1 and a variance of 
16. Find: 
a) the probability that the random variable has a negative value 


b) the probability that the random variable has a value between 1 
and 2 


c) the probability that the random variable is greater than 4. 


Answers: 0.0987, 0.2266, 0.4013 


Exercise 2—5.2 


For the random variable of Exercise 2—5.1, find 
a) the fourth central moment 
b) the fourth moment 
c) the third central moment 


d) the third moment. 


Answers: 0, 49, 768, 865 





2—6 Density Functions Related to Gaussian 


The previous section has indicated some of the reasons for the tremendous impor- 
tance of the Gaussian density function. Still another reason is that there are many 
other probability density functions, which arise in practical applications, that are 
related to the Gaussian density function and can be derived from it. The purpose 
of this section is to list some of these other density functions and indicate the 
situations under which they arise. They will not all be derived here, since in most 
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cases insufficient background is available, but several of the more important ones 
will be derived as illustrations of particular techniques. 


Distribution of power. When the voltage or current in a circuit is the random 
variable, the power dissipated in a resistor is also a random variable that is pro- 
portional to the square of the voltage or current. The transformation that applies 
in this case is discussed in Section 2—3 and is used here to determine the proba- 
bility density function associated with the power of a Gaussian voltage or current. 
In particular, let / be the random variable /(t,) and assume that f;(i) is Gaussian. 
The power random variable, W, is then given by 


W = RI? 


and it is desired to find its probability density function fy(w). By analogy to the 
result in (2—6), this probability density function may be written as 


l Wi Ww | 
ga a) o 3] =? uu 


= 0 w <0 


If / is Gaussian and assumed to have zero mean, then 


| P 
Wid = cs ere (= 53) 


where o^ is the variance of /. Hence, c; has the physical significance of being 
the rms value of the current. Furthermore, since the density function is symmet- 
rical, fj(/) = fj(—i). Thus, the two terms of (2-29) are identical and the proba- 
bility density function of the power becomes 


l w 
SS ee | s w= QQ 
fwiw) TV 27Rw P ( ui] (2-30) 
This density function is sketched in Figure 2-12. Straightforward calculation in- 
dicates that the mean value of the power 1s 


W = E[RI’] = Raj? 
and the variance of the power is 
ow = W? — (Wy = EIRT] — Wy 


3R^g* cR (Ro)? a 2R"gj 


It may be noted that the probability density function for the power is infinite at 
w = 0; that is, the most probable value of power is zero. This is a consequence 
of the fact that the most probable value of current is also zero and that the deriv- 
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Figure 2-12 Density function for the power of a Gaussian current. 


ative of the transformation (dW/d/) is zero here. It is important to note, however, 
that there is not a delta function in the probability density function. 

The probability distribution function for the power can be obtained, in princi- 
ple, by integrating the probability density function for the power. However, this 
integration does not result in a closed-form result. Nevertheless, it is possible to 
obtain the desired probability distribution function quite readily by employing the 
basic definition. Specifically, the probability that the power is less than or equal 
to some value w is just the same as the probability that the current is between the 
values of + Vw/R and — V w/R. Thus, since / is assumed to be Gaussian with 
zero mean and variance oj, the probability distribution function for the power 
becomes 





IR — N/wiR 
Pr [i € VwiR] — Pr [i € —VwiR] = o| 7 | m (X 


Fw(w) 
CO, Ty 





=0 w<0 


As an illustration of the use of the power distribution function consider the 
power delivered to a loudspeaker in a typical stereo system. Assume that the 
speaker has a resistance of 4 ohms and is rated for a maximum power of 25 watts. 
If the current driving the speaker is assumed to be Gaussian and at a level that 
provides an average power of 4 watts, what is the probability that the maximum 
power level of the speaker will be exceeded? Since 4 watts dissipated in 4 ohms 
implies a value of cy — ], it follows that 





Pr (W 25) 


=) 


| — Fy(25) = | = o/ 


2(1 — 0.9798) = 0.0404 


Il 


This probability implies that the maximum speaker power is exceeded several 
times per second for a Gaussian signal. The situation is probably worse than this 


DENSITY FUNCTIONS RELATED TO GAUSSIAN 73 


in an actual case because the probability density function of music is not Gaus- 
sian, but tends to have peak values that are more probable than that predicted by 
the Gaussian assumption. 





Exercise 2—6.1 


A Gaussian random voltage having a mean value of zero and a stan- 
dard deviation of 10 V is applied to a resistance of 4 2. Find: 
a) the approximate probability that the power dissipated in the re- 
sistance is between 9.9 watts and 10.1 watts (use the power 
density function) 


b) the probability that the power dissipated in the resistor is 
greater than 25 watts 


c) the probability that the power dissipated in the resistor is less 
than or equal to 10 watts. 


Answers: 0.00616, 0.3174, 0.472 





Rayleigh distribution. The Rayleigh probability density function arises in 
several different physical situations. For example, it will be shown later that the 
peak values (that is, the envelope) of a random voltage or current having a Gaus- 
sian probability density function will follow the Rayleigh density function. The 
original derivation of this density function (by Lord Rayleigh in 1880) was applied 
to the envelope of the sum of many sine waves of different frequencies. It also 
arises in connection with the errors associated with the aiming of firearms, mis- 
siles, and other projectiles, if the errors in each of the two rectangular coordinates 
have independent Gaussian probability densities. Thus, if the origin of a rectan- 
gular coordinate system is taken to be the target and the error along one axis is X 
and the error along the other axis is Y, the total miss distance is simply 


R-VXTY 


When X and Y are independent Gaussian random variables with zero mean and 
equal variances, o^, the probability density function for R is 





r ro 
= —expl — — > 
PRY) o exp | zs r «0 


= (0) r«o0 


(2-31) 


This is the Rayleigh probability density function and is sketched in Figure 2—13 
for two different values of o^. Note that the maximum value of the density func- 
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Figure 2-13 The Rayleigh probability density function. 


tion is at c, but that the density function is not symmetrical about this maximum 


point. 
The mean value of the Rayleigh-distributed random variable is easily computed 


from 
R= [ ma = | Seo(-25)4 
= R) dr = | a? CXP aa? r 
_ fr 
2 


and the mean-square value from 


[ - [ r E 
A r'fe(r) dr = A exp (- zs dr 


2 





R? 


= 20 
The variance of R is therefore given by 
a^, = R^ — (RY = (2 e e = 0.4290° 


Note that this variance is not the same as the variance g? of the Gaussian random 
variables that generate the Rayleigh random variable. It may also be noted that, 
unlike the Gaussian density function, both the mean and variance depend upon a 
single parameter (a^) and cannot be adjusted independently. 

It is straightforward to find the probability distribution function for the Rayleigh 
random variable because the density function can be integrated readily. Thus, 


F,Ar) = p X ip mie =r r=0 
EE P us a P \ 20? i (2-32) 
0 raog 
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As an example of the Rayleigh density function, consider an aiming problem in 
which an archer shoots at a target two feet in diameter and for which the bulls- 
eye is centered on the origin of an XY coordinate system. The position at which 
any arrow strikes the target is a random variable having an X-component and a Y- 
component. It is determined that the standard deviation of these components is 
1/4 foot; that is, y = oy = 1/4. On the assumption that the X and Y components 
of the hit position are independent Gaussian random variables, the distance from 
the hit position to the center of the target (i.e., the miss distance) is a Rayleigh 
distributed random variable for which the probability density function is 


frir) = 16r exp (- 8r r=0 


Using the results obtained above, the mean value of the miss distance becomes 
R = WVl2(1/4) = 0.313 feet and its standard deviation is c, = V/0.429(1/4) 
= 0.164 feet. From the distribution function the probability that the target will be 
missed completely is 





Pr (Miss) = 1 — F&(1) = 1 — f — aol xa) | 


= e" = 3.35 x 107^ | 


Similarly, if the bulls-eye is two inches in diameter, the probability of making a 
bulls eye 1s 


l 8 
Pr (Bulls-eye) = Fae(—] = 1 - >| ———] = 0.0540 
r (Bulls-eye) (4) exp | È) 


Obviously, this example describes an archer who is not very skillful, in spite of 
the fact that he rarely misses the entire target! 





Exercise 2—6.2 


An amateur marksman fires a pistol at a target 8 inches in diameter. 
It is determined that the probability that he will miss the target entirely 
is 0.01. Find the mean miss distance (from the center of the target) 
for all shots fired. 


Answer: 1.652 





Maxwell distribution. A classical problem in thermodynamics is that of de- 
termining the probability density function of the velocity of a molecule in a perfect 
gas. The basic assumption is that each component of velocity is Gaussian with 
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zero mean and a variance of o? = KT/m, where k is Boltzmann's constant, T is 
the absolute temperature, and m is the mass of the molecule. The total velocity 
is, therefore, 





and is said to have a Maxwell distribution. The resulting probability density func- 
tion can be shown to be 


Iv) = 
= 0 v«0 


| 
Ta 
3 5. 
ad - 

eb 

px 

"3 
Pie 

| 

5 
ho 

L3 
b MN d 

= 

IV 

e 


(2-33) 


The mean value of a Maxwellian-distributed random variable (the average mol- 
ecule velocity) can be found in the usual way and is 


= 8 
V= j|—wc 
i TT 


The mean-square value and variance can be shown to be 


Vase" 
2 v2 zz3 8 2 
ay = V* — (V) -(3-4)o 
Fa 
= 0.4530" 


The mean kinetic energy can be obtained from V^ since 


and 


EX NN OE 
Eje] = > mV = 5 no = 3m( = 5 AT 
which is the classical result. 

The probability distribution function for the Maxwell density cannot be ex- 
pressed readily in terms of elementary functions, or even in terms of tabulated 
functions. Thus, in most cases involving this distribution function, it is necessary 
to carry out the integration numerically. As an illustration of the Maxwell distri- 
bution, suppose we attempt to determine the probability that a given gas molecule 
will have a kinetic energy that is more than twice the mean value of kinetic energy 


for all the molecules. Since the kinetic energy is given by 
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and the mean kinetic energy is just (3/2)mo^, the velocity of a molecule having 
more than twice the mean kinetic energy is 


V» wv6o 


The probability that a molecule will have a velocity in this range is 


2 y F- 
Pr (V > V6o) = | Ses (-3) dv 
i NTO 20 


This can be integrated numerically to yield 
Pr (e > 2e) = Pr (V > Vo) = 0.1128 








Exercise 2-6.3 


In a certain gas at 300 K, it is found that the number of molecules 
having velocities in the vicinity of 1 x 10? meters/second is twice as 
great as the number of molecules having velocities in the vicinity of 5 
x 10? meters/second. Find: 


a) the mean velocity of the molecules 


b) the mass of the molecules. 


Answers: 2794.9, 1.35 x 10 ?' 





Chi-square distribution. A generalization of the above results arises if one 
defines a random variable as 


E". Ya e Ys oem (2-34) 


where Yi, Y>, . . . , Y, are independent Gaussian random variables with 0 mean 
and variance 1. The random variable X? is said to have a Chi-square distribution 
with n degrees of freedom and the probability density function is 


(x^ n/2 —1 2 
fo?) = oO ( _ z) x zo 

= 0 xr <0 
With suitable normalization of random variables (so as to obtain unit variance), 
the power distribution discussed above is seen to be chi-square with n = 1. Like- 
wise, in the Rayleigh distribution, the square of the miss-distance (R^) is chi- 
square with n = 2; and in the Maxwell distribution, the square of the velocity 


(2-35) 





78 CHAPTER 2 RANDOM VARIABLES 


(V°) is chi-square with n = 3. This latter case would lead to the probability 
density function of molecule energies. 

The mean and variance of a chi-square random variable are particularly simple 
because of the initial assumption of unit variance for the components. Thus, 


X =n 
(cy2) = 2n 


The chi-square distribution arises in many signal detection problems in which 
one is sampling an observed voltage and attempting to decide if it is just noise or 
if it contains a signal also. If the observed voltage is just noise, then the samples 
have zero mean and the chi-square distribution described above applies. If, how- 
ever, there is also a signal in the observed voltage, the mean value of the samples 
is not zero. The random variable that results from summing the squares of the 
samples as in (2-34) now has a non-central chi-square distribution. Although 
detection problems of the sort described here are extremely important, further 
discussion of this application of the chi-square distribution is beyond the scope of 
this book. 





Exercise 2—6.4 


Ten independent samples of a Gaussian voltage are taken and each 
sample is found to have zero mean and a variance of 16. A new ran- 
dom variable is constructed by summing the squares of these sam- 
ples. Find: 


a) the mean 
b) the variance of this new random variable. 


Answers: 5120, 160 





Log-normal distribution. A somewhat different relationship to the Gaussian 
distribution arises in the case of random variables that are defined as the loga- 
rithms of other random variables. For example, in communication systems the 
attenuation of the signal power in the transmission path is frequently expressed in 
units of nepers, and is calculated from 


A | Wout r 
= — | nepers 
nb pe 
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where Win and W,,, are the input and output signal powers respectively. An ex- 
perimentally observed fact is that the attenuation A is very often quite close to 
being a Gaussian random variable. The question that arises, therefore, concerns 
the probability density function of the power ratio. 

In order to generalize this result somewhat, let two random variables be related 
by 


Y = InX 
or, equivalently, by 
X=e 


and assume that Y is Gaussian with a mean of Y and a variance oy’. By using 
(2-5) it is easy to show that the probability density function of X is 


een: | en 
V2a oyx P 20 y iu 


z x«0 


fx) 
(2-36) 


This is the /og-normal probability density function. In engineering work base 10 
is frequently used for the logarithm rather than base e, but it is simple to convert 
from one to the other. Some typical density functions are sketched in Figure 
2-14. 

The mean and variance of the log-normal random variable can be evaluated in 
the usual manner and become 


- l 
exp (7 B oy) 


- 1 
oy" = [exp (oy) — 1] exp 2 (7 + foy) 


X 


fx) 





Figure 2-14 The log-normal probability density function. 
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The distribution function for the log-normal random variable cannot be ex- 
pressed in terms of elementary functions. If calculations involving the distribution 
function are required, it is usually necessary to carry out the integration by nu- 
merical methods. 





Exercise 2—6.5 
A log-normal random variable is generated by a Gaussian random 
variable having a mean value of 2 and a variance of 1. 


a) Find the most probable value of the log-normal random vari- 
able. 


b) Repeat if the Gaussian random variable has a mean value of 4 
and a variance of 6. 


Answers: 2.718, 0.1353 





2—f Other Probability Density Functions 


In addition to the density functions that are related to the Gaussian, there are many 
others that frequently arise in engineering. Some of these are described here and 
an attempt is made to discuss briefly the situations in which they arise. 


Uniform distribution. The uniform distribution was mentioned in an earlier 
section and used for illustrative purposes; it is generalized here. The uniform dis- 
tribution usually arises in physical situations in which there is no preferred value 
for the random variable. For example, events that occur at random instants of time 
(such as the emission of radioactive particles) are often assumed to occur at times 
that are equally probable. The unknown phase angle associated with a sinusoidal 
source is usually assumed to be uniformly distributed over a range of 27 radians. 
The time position of pulses in a periodic sequence of pulses (such as a radar 
transmission) may be assumed to be uniformly distributed over an interval of one 
period, when the actual time position with respect to zero time is unknown. All 
of these situations will be employed in future examples. 

The uniform probability density function may be represented generally as 


f) 


- Xj-Xx22 
X2 m Xi (2-37) 
= 0 otherwise 
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It is quite straightforward to show that 


l 
2 (x + xj) (2-38) 
and 


l 3 
ox = 7p 0 ow (2-39) 


The probability distribution function of a uniformly distributed random variable 
is obtained easily from the density function by integration. The result is 


Fy(x) = 0 XS xj 


X — Xj 

= ——— Xj XX (2-40) 
X3 — Ay 

= | XS 


One of the important applications of the uniform distribution is in describing 
the errors associated with analog-to-digital conversion. This operation takes a con- 
tinuous signal that can have any value at a given time instant and converts it into 
a binary number having a fixed number of binary digits. Since a fixed number of 
binary digits can represent only a discrete set of values, the difference between 
the actual value and the closest discrete value represents the error. This is illus- 
trated in Figure 2—15. In order to determine the mean-square value of the error, it 
is assumed that the error is uniformly distributed over an interval from — Ax/2 to 
Ax/2 where Ax is the difference between the two closest levels. Thus, from 
(2—38), the mean error is zero, and from (2—39) the variance or mean-square error 


l 
is 149". 
The uniform probability density function also arises quite naturally when deal- 


ing with sinusoidal time functions in which the phase is a random variable. For 
example, if a sinusoidal signal is transmitted at one point and received at a distant 





Figure 2-15 Error in analog-to-digital conversion. 
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point, the phase of the received signal is truly a random variable when the path 
over which signal travels is many wavelengths long. Since there is no physical 
reason for any one phase angle to be preferred over any other angle, the usual 
assumption is that the phase is uniformly distributed over a range of 27. In order 
to illustrate this, suppose we have a time function of the form 


x(t) = cos (wt — 0) 
The phase angle @ is assumed to be a random variable whose probability density 


function is 


2 
2T 


0 elsewhere 


From the previous discussion of the uniform density function, it is clear that the 
mean value of O is 


Ol 
Il 
3 


and the variance of O is 


It should also be noted that one could have just as well defined the region over 
which O exists to be —7 to +77, or any other region spanning 27. Such a choice 
would not change the variance of O at all, but it would change the mean value. 





Exercise 2—7.1 


A continuous signal that can assume any value between — 10 V and 
+10 V with equal probability is converted to digital form by quantizing. 


a) How many discrete levels are required in order for the mean- 
square value of the quantizing error to be 0.01 volts squared? 


b) If the number of discrete levels is to be a power of 2 in order 
to efficiently encode the levels into a binary number, how many 
levels are required to keep the mean-square value of the quan- 
tizing error not greater than 0.01 volts squared? 


c) If the number of levels of part (b) are used, what is the actual 
mean-square quantizing error? 


Answers: 0.003, 142, 256 
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Exponential and related distributions. It was noted in the discussion of the 
uniform distribution that events occurring at random time instants are often as- 
sumed to occur at times that are equally probable. Thus, if the average time inter- 
val between events is denoted 7, then the probability that an event will occur in a 
time interval Ar that is short compared to 7 is just At/7 regardless of where that 
time interval is. From this assumption it is possible to derive the probability dis- 
tribution function (and, hence, the density function) for the time interval between 
events. 

In order to carry out this derivation, consider the sketch in Figure 2-16. It is 
assumed that an event has occurred at time £9, and it is desired to determine the 
probability that the next event will occur at a random time lying between ty + T 
and to + 7 + At. If the distribution function for 7 is F(7), then this probability is 
just F(r + At) — F(r). But the probability that the event occurred in the A: 
interval must also be equal to the product of the probabilities of the independent 
events that the event did not occur between fp and fg + 7 and the event that it did 
occur between fg + 7 and to + 7 + At. Since 


| — F(T) = probability that event did not occur between to and ty + T 
At - a 
— = probability that it did occur in Ar 
= 


it follows that 
F(t + Ad) — F(r) = [1 — reon( 2) 


Upon dividing both sides by Af and letting At approach zero, it is clear that 


lim F(t + At) — F(r) = dF (7) _ in - FD) 
Ar—0 At dt T 


The latter two terms comprise a first-order differential equation that can be solved 
to yield 


F(t) = 1 — exp (=) T20 (2-41) 





to tT tgFT + At 


Figure 2-16 Time interval between events. 
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Figure 2-17 The exponential probability density function. 


In evaluating the arbitrary constant, use is made of the fact that F(0) — 0 since 7 
can never be negative. 

The probability density function for the time interval between events can be 
obtained from (2-41) by differentiation. Thus, 


l — 
f(T) — exp (=) r20 


F 
= 0 T«0 


(2-42) 


This is known as the exponential probability density function and is sketched in 
Figure 2—17 for two different values of average time interval. 
As would be expected, the mean value of 7 is just 7. That is, 


Elt] = f = exp (=) dt =7 


The variance turns out to be 
2 ET. 
O, = (7) 


It may be noted that this density function (like the Rayleigh) is a single-parameter 
one. Thus the mean and variance are uniquely related and one determines the 
other. 

As an illustration of the application of the exponential distribution, suppose that 
component failures in a spacecraft occur independently and uniformly with an 
average time between failures of 100 days. The spacecraft starts out on a 200-day 
mission with all components functioning. What is the probability that it will com- 
plete the mission without a component failure? This is equivalent to asking for the 
probability that the time to the first failure is greater than 200 days; this is simply 
[1 — F(200)] since F(200) is the probability that this interval is /ess than (or 
equal to) 200 days. Hence, from (2-41) 


open 
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and for 7 — 100, 7 — 200, this becomes 


' — 200 
l F(200) = exp | 100 | = 0.1352 

As a second example of the application of the exponential distribution consider 
a traveling wave tube (TWT) used as an amplifier in a satellite communication 
system and assume that it has a mean-time-to-failure (MTF) of 4 years. That is, 
the average lifetime of such a traveling wave tube is 4 years, although any partic- 
ular device may fail sooner or last longer. Since the actual lifetime, T, is a random 
variable with an exponential distribution, we can determine the probability asso- 
ciated with any specified lifetime. For example, the probability that the TWT will 
survive for more than 4 years is 


Pr (T > 4) = 1 — F(4) = 1 — (1 — e *5) = 0.368 
Similarly, the probability that the TWT will fail within the first year is 
Pr (T = 1) = F(l) = 1 — e~™ = 0.221 
or the probability that it will fail between the 4th and the 6th years is 
Pr (4 < T = 6) = F(6) — F(4) = (1 — e ™) — (1 — e ™) = 0.147 
Finally, the probability that the TWT will last as long as 10 years is 
Pr (T > 10) = 1 — F(10) = 1 — (1 — e 95) = 0.0821 


The random variable in the exponential distribution is the time interval between 
adjacent events. This can be generalized to make the random variable the time 
interval between any event and the kth following event. The probability distribu- 
tion for this random variable is known as the Erlang distribution and the proba- 
bility density function is 

TU! exp (— T/F) = 
flr) ik — Di T=0,k=1,2,3,... asm 
= 0 ean 


Such a random variable is said to be an Erlang random variable of order k. Note 
that the exponential distribution is simply the special case for k = 1. The mean 
and variance in the general case are k7 and k(7)* respectively. The general Erlang 
distribution has a great many applications in engineering pertaining to the reliabil- 
ity of systems, the waiting times for users of a system (such as a telephone system 
or traffic system), and the number of channels required in a communication sys- 
tem to provide for a given number of users with random calling times and message 
lengths. 

The Erlang distribution is also related to the gamma distribution by a simple 
change in notation. Letting 6 = 1/7 and a be a continuous parameter that equals 
k for integral values, the Gamma distribution can be written as 
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pez l | 
Ta) exp ( — B7) T usi) 


= 0 T«0 





f(7) 


(2-44) 


The mean and variance of the Gamma distribution are a/f and a/B* respectively. 





Exercise 2-7.2 


A package of 100-watt light bulbs states that the average lifetime is 
750 hours. Two light bulbs from this package are installed in a lamp 
fixture at the same time. If it is assumed that the lifetimes of the two 
bulbs are independent random variables, find: 


a) the probability that both bulbs will burn out before 750 hours 


b) the probability that one bulb will burn out before 750 hours and 
the other one will burn out after 750 hours 


c) the probability that both bulbs will last longer than 750 hours. 


Answers: 0.1353, 0.2325, 0.3996 





Delta distributions. It was noted earlier that when the possible events could 
assume only a discrete set of values, the appropriate probability density function 
consisted of a set of delta functions. It is desirable to formalize this concept some- 
what and indicate some possible applications. As an example, consider the binary 
waveform illustrated in Figure 2-18. Such a waveform arises in many types of 
communication systems or control systems since it obviously is the waveform with 
the greatest average power for a given peak value. It will be considered in more 





Figure 2-18 A general binary waveform. 
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detail throughout the study of random processes, but the present interest is in a 
single random variable, X = x(t,), at a specified time instant. This random vari- 
able can assume only two possible values, x, or x2; it is specified that it take on 
value x; with probability p, and value x; with probability p; = 1 — p,. Thus, the 
probability density function for X is 

f(x) = pi 6x — xy) + p2 6(x — xj) (2-45) 
The mean value associated with this random variable is evaluated easily as 

X = f^. x[pi B(x — xi) + pz 8(x — x)] dx 


= py, + pax? 


The mean-square value is determined similarly from 


Y = |. tp, B(x — xj) + p; OX — x;) dx 


2 2 
= pu, + pax 


Hence, the variance is 


oy = X -= XY = py + px? — (pix, + pxo) 
= pipxxi — xy 


in which use has been made of the fact that p, = 1 — p, in order to arrive at the 
final form. 

It should be clear that similar delta distributions exist for random variables that 
can assume any number of discrete levels. Thus, if there are n possible levels 


designated as xj, x5, . . . , Xn, and the corresponding probabilities for each level 
are pi, po, . . . , p, then the probability density function is 

fx) = 2, p, 8(x — x) (2-46) 
in which 


By using exactly the same techniques as above, the mean value of this random 
variable is shown to be 


X= > Piri 


and the mean-square value is 
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From these, the variance becomes 


^7 


IT n > 
"7 9 ~ ] 
Oy ES pi*i — (X px 


iz | pz] 


" n 


: 3 
2 2 25 PiPÁX; = xj)" 


Z i=l j=] 


The multilevel delta distributions also arise in connection with communication 
and control systems, and in systems requiring analog-to-digital conversion. Typi- 
cally the number of levels is an integer power of 2, so that they can be efficiently 
represented by a set of binary digits. 





Exercise 2-7.3 


When four coins are tossed, the random variable is taken to be the 
number of heads that result. Find: 
a) the mean value of this random variable 


b) the variance of this random variable. 


Answers: 1.0, 2.0 





2—8 Conditional Probability Distribution and 
Density Functions 


The concept of conditional probability was introduced in Section 1—7 in connec- 
tion with the occurrence of discrete events. In that context it was the quantity 
expressing the probability of one event given that another event, in rhe same 
probability space, had already taken place. It is desirable to extend this concept 
to the case of continuous random variables. The discussion in the present section 
will be limited to definitions and examples involving a single random variable. 
The case of two or more random variables is considered in Chapter 3. 

The first step is to define the conditional probability distribution function for a 
random varaible X given that an event M has taken place. For the moment the 
event M is left arbitrary. The distribution function is denoted and defined by 


F(x|M) = Pr [X = x|M] (2-47) 
B Pr {X = x, M} 


: Pr (M) > 0 
Pr (M) 
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where {X = x, M} is the event of all outcomes £ such that 
X =x and €EM 


where X(£) is the value of the random variable X when the outcome of the exper- 
iment is £. Hence (X = x, M] is the continuous counterpart of the set product 
used in the previous definition of (1—17). It can be shown that F(x|M) is a valid 
probability distribution function and, hence, must have the same properties as any 
other distribution function. In particular, it has the following characteristics: 


1. 0 = FQ|M) = 1 -0 « x « oc 
2. F(—*«|M) = 0 F(*«|M) = 1 
3. F(x|M) is nondecreasing as x increases 
4. Pr [xj < X = x4|M] = F(xj| M) — F(x,|M) = 0 
for x, € x; 


Now it is necessary to say something about the event M upon which the prob- 
ability is conditioned. There are several different possibilities that arise. For ex- 
ample: 


l. Event M may be an event that can be expressed in terms of the random 
variable X. Examples of this are considered in this section. 

2. Event M may be an event that depends upon some other random variable, 
which may be either continuous or discrete. Examples of this are consid- 
ered in Chapter 3. 

3. Event M may be an event that depends upon both the random variable X 
and some other random variable. This is a more complicated situation that 
will not be considered at all. 


As an illustration of the first possibility above, let M be the event 
M = {X = m} 
Then the conditional distribution function is, from (2—47), 


Prix =x, XS mp 


F(x|M) = Pr {X XIX = m) = Prix = mi} 


There are now two possible situations—depending upon whether x or m is larger. 

If x = m, then the event that X = rn is contained in the event that X = x and 
Prix = x, X = mj}. = Pr {X = m} 

Thus, 


Pr {X =m} _ 
Pr{X =m} 


X= m 


F(x|M) = 


On the other hand, if x = m, then {X = x} is contained in {X = m} and 
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Figure 2-19 A conditional probability distribution function. 


-Psy — FQ) 
FOM m. X=m Fim) 


The resulting conditional distribution function is shown in Figure 2-19. 
The conditional probability density function is related to the distribution func- 
tion in the same way as before. That is, when the derivative exists, 


dF(x|M) 
dx 


This also has all the properties of a usual probability density function. That is, 
l. falM) =O --cxc«o 
2. f fM) dx = 1 


f(x|M) = (2-48) 


3. F(x|M) = {ful du 
x2 

4. | f(x|M) dx = Pr [x, € X = x;|M] 
x] 


If the example of Figure 2-19 is continued, the conditional probability density 
function is 


Aa sut FCU LL UM. uL EM s cetus 


F(m) dx F(m) i fx) dx 
= 0 x=m 


This is sketched in Figure 2-20. 
The conditional probability density function can also be used to find conditional 
means and conditional expectations. For example, the conditional mean is 


E[X|M] = |. xf(x|M) dx (2-49) 
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| 0 m 


Figure 2—20 Conditional probability density function corresponding to Figure 2—19. 


More generally, the conditional expectation of any g(X) is 


Elg(X)|M] = i g(x)f(x|M) dx (2-50) 


As an illustration of the conditional mean, let the f(x) in the above example be 
Gaussian so that 


| — Xy 


In order to make the example simple, let m = X so that 


m-X | ET 
rm = [ese [ai 








Thus 
fe) 2 s] _@- x T 
f(x|M) = 2 a | T. | xXx 
= 0 xx 


Hence, the conditional mean is 
| _-F 2x (x—Xy7 | 
EpxdM] = J. Fo | 29? Jas 
-f Xu + X) af d 
Jae V 2T c p 20° 
5 


cU = fe 
TP 





In words, this result says that the expected value or conditional mean of a Gaus- 
sian random variable, given that the random variable is less than its mean, is just 
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As a second illustration of this formulation of conditional probability, let us 
consider another archery problem. In this case, let the target be 12 inches in 
diameter and assume that the standard deviation of the hit positions is 4 inches in 
both the X-direction and the Y-direction. Hence, the unconditional mean value of 
miss distance from the center of the target, for all attempts, including those that 
miss the target completely, is just R = 4\/7/2 = 5.013 inches. We now seek to 
find the conditional mean value of the miss distance given that the arrow strikes 
the target. Hence, we define the event M to be the event that the miss distance R 
is less than or equal to six inches. Thus, the conditional probability density func- 
tion appears as 

f(r) 


f(r\M] = F(6) 


Since the unconditional density function on R is 


TE R6 (x L1 
f(r) = jg EP 73 r= 


and the probability that & is less than or equal to 6 is 





F(6) = 1 — e 9"? = 0.675 


it follows that the desired conditional density function is 


F r? 
/ 0) = 10.806 P (-5) "enn 


Hence, the conditional mean value of the miss distance is 





6 2 E 
E[R|M] = ji TEET exp (-5) dr — 3.601 inches 


0 


in which the integration has been carried out numerically. Note that this value is 
considerably smaller than the unconditional miss distance. 





Exercise 2—8.1 


A Gaussian random voltage having zero mean and a standard devia- 
tion of 10 V is connected in series with a 10—ohm resistor and an ideal 
diode. Find the mean value of the resulting current using the concepts 
of conditional probability. 


Answer: 0.7979 
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Exercise 2—8.2 
A traveling wave tube has a mean-time-to-failure of 4 years. Given 
that the TWT has survived for 4 years, find the conditional probability 
that it will fail between the 4th and 6th years. 


Answer: 0.3935 





2—9 Examples and Applications 


The preceding sections have introduced some of the basic concepts concerning the 
probability distribution and density functions for a continuous random variable. 
Before extending these concepts to more than one variable, it is desirable to con- 
sider a few examples illustrating how they might be applied to simple engineering 
problems. 

As a first example, consider the elementary voltage-regulating circuit shown in 
Figure 2-21(a). It employs a Zener diode having an idealized current-voltage 
characteristic as shown in Figure 2—21(b). Note that current is zero until the volt- 
age reaches the breakdown value (V. — 10) and from then on is limited by the 
external circuit, while the voltage across the diode remains constant. Such a cir- 
cuit is often used to limit the voltage applied to solid-state devices. For example, 
the AR, indicated in the circuit may be a transistorized amplifier designed to work 
at 9 V and that is damaged if the voltage exceeds 10 V. The supply voltage, V,, 
is from a power supply whose nominal voltage is 12 V, but whose actual voltage 
contains a sawtooth ripple and, hence, is a random variable. For purposes of this 
example, it will be assumed that this random variable has a uniform distribution 
over the interval from 9 to 15 V. 


R lz 

T + 
l 

V; V, 
R, = 10° 

E " V 

V, 710" 
(a) (b) 


Figure 2-21 Zener diode voltage regulator: (a) voltage-regulating circuit and (b) 
Zener diode characteristic. 
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V 


5 


0 R+10 


Figure 2—22 Relation between diode power dissipation and supply voltage. 


Zener diodes are rated in terms of their ability to dissipate power as well as 
their breakdown voltage. It will be assumed that the average power rating of this 
diode is W, = 3 W. It is then desired to find the value of series resistance, R, 
needed to limit the mean dissipation in the Zener diode to this rated value. 

When the Zener diode is conducting, the voltage across it is V, = 10, and the 
current through it is 


V, — V: VAR + Rj) IO(R + 10) 
L= ——q Hu Se 
R es R; 10 
where the load current, /;, is 1 A. The power dissipated in the diode is 
VAV, — V. 
W. = Val: = Vals — Se = I,V. 
R 
10V, — 100 
= ————_ - 10 V,>R + 10 
R 


A sketch of this power as a function of the supply voltage V, is shown in Figure 
2-22, and the probability density functions of V, and W, are shown in Figure 
2—23. Note that the density function of W, has a large delta function at zero, since 





f Av) fW) 
R+10 
, F(R + 10) 
6 
- i" Ww 
0 9 12 15 0 x zd 
(a) (b) 


Figure 2-23 Probability density functions for supply voltage and diode power dis- 
sipation: (a) probability density function for V; and (b) probability density function 
for W, 
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the diode is not conducting most of the time, but is uniform for larger values of 
W since W, and V, are linearly related in this range. From the previous discussion 
of transformations of density functions in Section 2-3, it is easy to show that 


| 
IA 
lA 
| 
| 
= 


fw(w) = Fy(R + 10) ó(w) + Éf (> +R+ io 0=w 


= 0 elsewhere 


where F,(-) is the distribution function of V,. Hence, the area of the delta function 
is simply the probability that the supply voltage V, is less than the value that 
causes diode conduction to start. 

The mean value of diode power dissipation is now given by 


EWI = W, = [won dw 


= [ wFy(R + 10) d(w) dw 


+ [of 8) oll n) o 


The first integral has a value of zero (since the delta function is at w = 0) and 
the second integral can be written in terms of the uniform density function 


[fv(y) = - 9 < v= 15], as 


a (50/R)— 10 R l (S ve RY 
W, = f «(&) (2) aR 


Since the mean value of diode power dissipation is to be less than or equal to 3 
watts, it follows that 


(5 — RY 


z3 O:<R=s 5 
1.2R 


from which 


Rz23.19 0 


It may now be concluded that any value of R greater than 2.19 Q would be 
satisfactory from the standpoint of limiting the mean value of power dissipation 
in the Zener diode to 3 watts. The actual choice of R would be determined by the 
desired value of output voltage at the nominal supply voltage of 12 V. If this 
desired voltage is 9 V (as suggested above) then R must be 


= 3.33 f) 
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V 
Rm= 1000 Q 


100 uA dc instrument 


Figure 2-24 Selection of a voltmeter resistor. 


which is greater than the minimum value of 2.19 Q and, hence, would be satis- 
factory. 

As another example, consider the problem of selecting a multiplier resistor for 
a dc voltmeter as shown in Figure 2—24. It will be assumed that the dc instrument 
produces full-scale deflection when 100 pA is passing through the coil and has a 
resistance of 1000 £2. It is desired to select a multiplier resistor R such that this 
instrument will read full scale when 10 V is applied. Thus, the nominal value of 
R to accomplish this (which will be designated as R*) is 


= 10 
10 





R* — 1000 = 9.9 x 10* Q 

However, the actual resistor used will be selected at random from a bin of 
resistors marked 10° (Q. Because of manufacturing tolerances, the actual resistance 
is a random variable having a mean of 10° and a standard deviation of 1000 N. It 
will also be assumed that the actual resistance is a Gaussian random variable. 
(This is a customary assumption when deviations around the mean are small, even 
though it can never be precisely true for quantities that must be always positive, 
like the resistance.) On the basis of these assumptions it is desired to find the 
probability that the resulting voltmeter will be accurate to within 2 percent.? 

The smallest value of resistance that would be acceptable is 


10 — 0.2 
Ras = 1000 = 9.7 x 10° 
10 
while the largest value is 
10 + 0.2 
Rmax = 074 — 1000 = 10.1 x 10° 


The probability that a resistor selected at random will fall between these two limits 
is 


10.1 x 104 
P, = Pr [9.7 x 10 < R x 10.1 x 105] = Í. flr) dr — (2-51) 


"This is interpreted to mean that the error in voltmeter reading due to the resistor value is less than or 
equal to 2 percent of the full scale reading. 
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where frir) is the Gaussian probability density function for R and is given by 


o d | S193 
RO = m (1000) P 2(105) 


The integral in (2-51) can be expressed in terms of the standard normal distri- 
bution function, ®(-), as discussed in Section 2-5. Thus, P, becomes 


10.1 x 10* — 10° 97 x 10* —- 10 
— MEE 


which can be simplified to 
Pe = O01) uw —3) 
= (1) — [1 — &(3)] 
Using the tables in Appendix D, this becomes 
P. = 0.8413 — [1 — 0.9987] = 0.8400 


Thus, it appears that even though the resistors are selected from a supply that is 
nominally incorrect, there is still a substantial probability that the resulting instru- 
ment will be within acceptable limits of accuracy. 

The third example considers an application of conditional probability. This ex- 
ample considers a traffic measurement system that is measuring the speed of all 
vehicles on an expressway and recording those speeds in excess of the speed limit 
of 70 miles per hour (mph). If the vehicle speed is a random variable with a 
Rayleigh distribution and a most probable value equal to 50 mph, it is desired to 
find the mean value of the excess speed. This is equivalent to finding the condi- 
tional mean of vehicle speed, given that the speed is greater than the limit, and 
subtracting the limit from it. 

Letting the vehicle speed be S, the conditional distribution function that is 
sought is 


Pr {S = s, S > 70) 


Pr {5 > 70} ved 


F[s|S > 70] = 


Since the numerator is nonzero only when s > 70, (2-52) can be written as 


F[s|S > 70] 


0 s = 70 (2-53) 
_ F(s) — F(70) 


> 
| — F(70) ai 


where F(-) is the probability distribution function for the random variable S. The 
numerator of (2—53) is simply the probability that S is between 70 and s, while 
the denominator is the probability that $ is greater than 70. 

The conditional probability density function is found by differentiating (2—53) 
with respect to s. Thus, 


98 CHAPTER 2 RANDOM VARIABLES 





f(s| 5) > 70 





5 


0 50 70 


Figure 2-25 Conditional and unconditional density functions for a Rayleigh-distrib- 
uted random variable. 





f(s|S > 70)= 0 s=70 
f(s) 
= — > 
| — F(70) bs 
where f(s) is the Rayleigh density function given by 
5 s? 
— - i > | 
iulii i | 2(50)° P (2-54) 

= 0 s<0 


These functions are sketched in Figure 2-25. 
The quantity F(70) is easily obtained from (2—54) as 


F(70) — [| oe = ds = l — ex = 
(50 P| 250) P | 50 


Hence, 


1 — F(70) = exp |-2] 


The conditional expectation is given by 
SS Fil E aure. be lA 
~ exp [—49/50] Jw (50? "P ^ | 26507 | ^ 


70 + 50 V/2z exp H | Z «2) 


IB 27.2 


Il 


Thus, the mean value of the excess speed is 27.2 miles per hour. Although it is 
clear from this result that the Rayleigh model is not a realistic one for traffic 
systems (since 27.2 miles per hour excess speed is much too large for the actual 
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situation), the above example does illustrate the general technique for finding con- 
ditional means. 

The final example in this section combines the concepts of both discrete prob- 
ability and continuous random variables and deals with problems that might arise 
in designing a satellite communication system. In such a system, the satellite 
normally carries a number of traveling wave tubes in order to provide more chan- 
nels and to extend the useful life of the system as the tubes begin to fail. Consider 
a case in which the satellite is designed to carry 6 TWTs and it is desired to 
require that after 5 years of operation there is a probability of 0.95 that at least 
one of the TWTs is still good. The quantity that we need to find is the mean time 
to failure (MTF) for each tube in order to achieve this degree of reliability. In 
order to do this we need to use some of the results discussed in connection with 
Bernoulli trials in Sec. 1—10. In this case, let k be the number of good TWTs at 
any point in time and let p be the probability that any TWT is good. Since we 
want the probability that at least one tube is good to be 0.95, it follows that 


Pr (k = 1) = 0.95 
Or 


6 
Y pb) = 1 — p = 1 — [9] pa — pf = 0.95 
k=] Ü 


which can be solved to yield p — 0.393. If we assume, as usual, that the lifetime 
of any one TWT follows an exponential distribution, then 


| Lj dr = 0.393 
5 T 
T = 5.353 


Thus, the mean time to failure for each TWT must be at least 5.353 years in order 
to achieve the desired reliability. 

A second question that might be asked is ''How many TWTs would be needed 
to achieve a probability of 0.99 that at least one will be still functioning after 5 
years? *'In this case, n is unknown but, for TWTs having the same MTF, the 
value of p is still 0.393. Thus, 


| — p,(0) = 0.99 


(2a — p = 0.01 


This may be solved for n to yield n = 9.22. However, since n must be an integer, 
this tells us that we must use at least 10 traveling wave tubes to achieve the 
required reliability. 
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Exercise 2—9.1 
The current in a semiconductor diode is often modeled by the Shock- 
ley equation 
I = hle" — 1] 


in which V is the voltage across the diode, /, is the reverse current, n 
is a constant that depends upon the physical diode and the tempera- 
ture, and / is the resulting diode current. For purposes of this exercise, 
assume that /) = 10 ? and ņ = 25. Find the resulting mean value of 
current if the diode voltage is a random variable that is 


a) uniformly distributed between O and 1 

b) Gaussian with zero mean and variance of 0.07, or 

c) Gaussian with zero mean and variance of 0.1. Comment on the 
results. 


Answers: 2.880, 3.163, 37,299 


Exercise 2-9.2 


A Thevenin's equivalent source has an open-circuit voltage of 18 V 
and a source resistance that is a random variable that is uniformly 
distributed between 4 ohms and 16 ohms. Find: 


a) the value of load resistance that should be connected to this 
source in order that the mean value of the power delivered to 
the load is a maximum 


b) the resulting mean value of power. 


Answers: 8,9 





PROBLEMS 


2—1.1 For each of the following situations, list any quantities that might reason- 
ably be considered a random variable, state whether they are continuous 
or discrete, and indicate a reasonable range of values for each. 
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a) A weather forecast gives the prediction for July 4th as: high temper- 
ature, 84; low temperature, 67; wind, 8 mph; humidity, 75%; THI, 
72; sunrise, 5:05 am; sunset, 8:45 pm. 


b) A traffic survey on a busy street yields the following values: number 
of vehicles per minute, 26; average speed, 35 mph; ratio of cars to 
trucks, 6.81; average weight, 4000 Ibs.; number of accidents per day, 
5. 


c) An electronic circuit contains 15 ICs, 12 LEDs, 43 resistors, and 12 
capacitors. The resistors are all marked 1000 ohms, the capacitors 
are all marked 0.01 microfarads, and the nominal supply voltage for 
the circuit 1s 5 volts. 


2-1.2 State whether each of the following random variables is continuous or 
discrete and indicate a reasonable range of values for each. 
a) The outcome associated with rolling a pair of dice. 


b) The outcome resulting from measuring the voltage of a 12-volt stor- 
age battery. 


c) The outcome associated with randomly selecting a telephone number 
from the telephone directory. 


d) The outcome resulting from weighing adult males. 
2-2.1 When ten coins are flipped, the event of interest is the number of heads. 
Let this number be the random variable. 


a) Plot the distribution function for this random variable. 


b) What is the probability that the random variable is between six and 
nine inclusive? 


c) What is the probability that the random variable is greater than or 
equal to eight? 


2-2.2 A random variable has a probability distribution function given by 


Fx) = 0 ———- ou qum x] 
= 0.5 + 0.5x —]1-x-«l1 
zm] l=ax<o 


l 
a) Find the probability that x = 4 
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b) Find the probability that x — : 


c) Find the probability that —0.5 < x = 0.5 


2-2.3 A probability distribution function for a random variable X has the form 


2-2.4 


2-3.1 


2-3.2 


Fy(x) = All - exp[ - (x — 1)]] | «£4 en 
= 0 =o x=] 
a) For what value of A is this a valid probability distribution 
function? 
b) What is F,(2)? 


c) What is the probability that the random variable lies in the 
interval 2 < X < o? 


d) What is the probability that the random variable lies in the 
interval 1 < X = 3? 


A random variable X has a probability distribution function of the form 


a) 


b) 


c) 


a) 


b) 


c) 


a) 


Fix) = 0 —wo< x= -2 
= A(l + cos bx) —2<x=2 
= ] 2«x« o 


Find the values of A and b that make this a valid probability distri- 
bution function. 


Find the probability that X is greater than 1. 
Find the probability that X is negative. 


Find the probability density function of the random variable of Prob- 
lem 2—2.1 and sketch it. 


Using the probability density function, find the probability that the 
random variable is in the range between four and seven inclusive. 


Using the probability density function, find the probability that the 
random variable is less than four. 


Find the probability density function of the random variable of Prob- 
lem 2—2.3 and sketch it. 


2-3.3 


2—3.4 


2—4.1 


2-4.2 


2-4.3 
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b) Using the probability density function, find the probability that the 
random variable is in the range 2 < X = 3. 


c) Using the probability density function, find the probability that the 
random variable is less than 2. 
a) A random variable X has a probability density function of the form 
ho) -exp(- 2h) —-ex< % 


A second random variable Y is related to X by Y = X^. Find the 
probability density function of the random variable Y. 


b) Find the probability that Y is greater than 2. 
a) A random variable Y is related to the random variable X of problem 


2-3.3 by Y — 3X — 4. Find the probability density function of the 
random variable Y. 


b) Find the probability that Y is negative. 
c) Find the probability that Y is greater than X. 


For the random variable of Problem 2-3.2 find: 


a) The mean value of X. 
b) The mean-square value of X. 


c) The variance of X. 
For the random variable X of Problem 2-2.4 find: 


a) The mean value of X. 
b) The mean-square value of X. 
c) The third central moment of X. 


d) The variance of X. 


A random variable Y has a probability density function of the form 
f(y) = Ky 0<x=6 
= 0 elsewhere 


a) Find the value of K for which this is a valid probability density func- 
tion. 
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2-5.1 


2-5.2 


2-5.3 
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b) Find the mean value of Y. 

c) Find the mean-square value of Y. 

d) Find the variance of Y. 

e) Find the third central moment of Y. 

f) Find the nth moment, E[Y"]. 

A power supply has five intermittent loads connected to it and each load, 

when in operation, draws a power of 10 W. Each load is in operation 

only one-quarter of the time and operates independently of all other loads. 

a) Find the mean value of the power required by the loads. 

b) Find the variance of the power required by the loads. 

c) If the power supply can provide only 40 watts, find the probability 
that it will be overloaded. 

A Gaussian random voltage has a mean value of 10 and a variance of 25. 

a) What is the probability that an observed value of the voltage is 
greater than zero? 


b) What is the probability that an observed value of the voltage is 
greater than zero but less than or equal to the mean value? 


c) What is the probability that an observed value of the voltage is 


greater than twice the mean value? 
For the Gaussian random variable of Problem 2-5.1 find: 


a) The fourth central moment. 
b) The fourth moment. 

c) The third central moment. 
d) The third moment. 


A Gaussian random current has a probability of 0.5 of having value less 
than or equal to 1.0. It also has a probability of 0.0228 of having a value 


greater than 5.0. 


a) Find the mean value of this random variable. 


b) Find the variance of this random variable. 


2-5.4 


2-6.1 


2-6.2 


2-6.3 
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c) Find the probability that the random variable has a value less than or 
equal to 3.0. 


A common method for detecting a signal in the presence of noise is to 
establish a threshold level and compare the value of any observation with 
this threshold. If the threshold is exceeded, it is decided that signal is 
present. Sometimes, of course, noise alone will exceed the threshold and 
this is known as a ‘‘false alarm.” Usually, it is desired to make the prob- 
ability of a false alarm very small. At the same time, we would like for 
any observation that does contain a signal plus the noise to exceed the 
threshold with a large probability. This is the probability of detection and 
should be as close to 1.0 as possible. Suppose we have Gaussian noise 
with zero mean and a variance of 1 V^ and we set a threshold level of 5 
volts. 


a) Find the probability of false alarm. 
b) If a signal having a value of 8 volts is observed in the presence of 


this noise, find the probability of detection. 


A Gaussian random current having zero mean and a variance of 4 A* is 
passed through a resistance of 3 £2. 

a) Find the mean value of the power dissipated. 

b) Find the variance of the power dissipated. 

c) Find the probability that the instantaneous power will exceed 36 W. 

A random variable X is Gaussian with zero mean and a variance of 1.0. 
Another random variable, Y, is defined by Y = X°. 

a) Write the probability density function for the random variable Y. 

b) Find the mean value of Y. 

c) Find the variance of Y. 

A current having a Rayleigh probability density function is passed 
through a resistor having a resistance of 277 2. The mean value of the 
current is 2 A. 

a) Find the mean value of the power dissipated in the resistor. 


b) Find the probability that the dissipated power is less than or equal to 
12 W. 
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c) Find the probability that the dissipated power is greater than 72 W. 


Marbles rolling on a flat surface have components of velocity in orthog- 
onal directions that are independent Gaussian random variables with zero 
mean and a standard deviation of 3 ft/s. 

a) Find the most probable speed of the marbles. 

b) Find the mean value of the speed. 


c) What is the probability of finding a marble with a speed greater than 
10 ft/s? 


The average speed of a nitrogen molecule in air at 20°C is about 500 m/s. 
Find: 

a) The variance of molecule speed. 

b) The most probable molecule speed. 

c) The rms molecule speed. 


Five independent observations of a Gaussian random voltage with zero 
mean and unit variance are made and a new random variable X? is formed 
from the sum of the squares of these random voltages. 

a) Find the mean value of X^. 

b) Find the variance of X^. 

c) What is the most probable value of X*? 

The log-normal density function is often expressed in terms of decibels 


rather than nepers. In this case, the Gaussian random variable Y is related 
to the log-normal random variable by Y = 10 logy X. 


a) Write the probabilities density function for X when this relation is 
used. 
b) Wirite an expression for the mean value of X. 


c) Write an expression for the variance of X. 


A random variable O is uniformly distributed over a range of 0 to 27. 
Another random variable X is related to O by 


X = cos O 


a) Find the probability density function of X. 


2—-1.2 


2-1.3 


2—].4 


2-1.5 
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b) Find the mean value of X. 
c) Find the variance of X. 


d) Find the probability that X > 0.5. 


A continuous-valued random voltage ranging between — 10 V and +10 
V is to be quantized so that it can be represented by a binary sequence. 


a) If the rms quantizing error is to be less than 1% of the maximum 
value of the voltage, find the minimum number of quantizing levels 
that are required. 


b) If the number of quantizing levels is to be a power of 2, find the 
minimum number of quantizing levels that will still meet the require- 
ment. 


c) How many binary digits are required to represent each quantizing 


level? 


A communications satellite is designed to have a mean time to failure 
(MTF) of five years. If the actual time to failure is a random variable that 
is exponentially distributed, find: 


a) The probability that the satellite will fail sooner than five years. 


b) The probability that the satellite will survive for ten years or more. 


c) The probability that the satellite will fail during the sixth year. 


A homeowner buys a package containing four light bulbs, each specified 
to have an average lifetime of 1000 hours. One bulb is placed in a single 
bulb table lamp and the remaining bulbs are used one after another to 
replace ones that burn out in this same lamp. 


a) Find the expected lifetime of the set of four light bulbs. 


b) Find the probability that the four light bulbs will last 5000 hours or 
more. 


c) Find the probability that the four light bulbs will all burn out in 2000 
hours or less. 


A continuous-valued signal has a probability density function that is uni- 
form over the range from —8 V to +8 V. It is sampled and quantized 
into eight equally spaced levels ranging from — 7 to +7. 


a) Write the probability density function for the discrete random vari- 
able representing one sample. 
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b) Find the mean value of this random variable. 


c) Find the variance of this random variable. 


a) For the communication satellite system of Problem 2-7.3, find the 
conditional probability that the satellite will survive for ten years or 
more given that it has survived for five years. 

b) Find the conditional mean lifetime of the system given that it has 
survived for three years. 


a) For the random variable X of Problem 2-7.1, find the conditional 
probability density function f(x|M), where M is the event 0 = O = 


2 Sketch this density function. 


b) Find the conditional mean E|X|M], for the same event M. 


A laser weapon is fired many times at a circular target that is 2 meters in 
diameter and it is found that one-tenth of the shots miss the target en- 
tirely. 


a) For those shots that hit the target, find the conditional probability that 
they will hit within 0.3 meter of the center. 


b) For those shots that miss the target completely, find the conditional 
probability that they come within 0.5 meter of the edge of the target. 


Consider again the threshold detection system described in Problem 2- 
5.4. 


a) When noise only is present, find the conditional mean value of the 
noise that exceeds the threshold. 


b) Repeat part (a) when both the specified signal and noise are present. 


Different types of electronic ac voltmeters produce deflections that are 
proportional to different characteristics of the applied waveforms. In most 
cases, however, the scale is calibrated so that the voltmeter correctly in- 
dicates the rms value of a sine wave. For other types of waveforms, the 
meter reading may not be equal to the rms value. Suppose the following 
instruments are connected to a Gaussian random voltage having zero 
mean and a standard deviation of 10 V. What will each read? 


a) An instrument in which the deflection is proportional to the average 
of the full-wave rectified waveform. That is, if X(r) is applied, the 
deflection is proportional to E[|X(1)|]. 


2-9.2 


2-9.3 


2-9.4 
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b) An instrument in which the deflection is proportional to the average 
of the envelope of the waveform. Remember that the envelope of a 
Gaussian waveform has a Rayleigh distribution. 


In a radar system, the reflected signal pulses may have amplitudes that 
are Rayleigh distributed. Let the mean value of these pulses be V/77/2. 
However, the only pulses that are displayed on the radar scope are those 
for which the pulse amplitude R is greater than some threshold ry in order 
that the effects of system noise can be supressed. 


a) Determine the probability density function of the displayed pulses; 
that 1s, find f(r|R > ro). Sketch this density function. 
b) Find the conditional mean of the displayed pulses if rọ = 0.5. 


A limiter has an input-output characteristic defined by 


V out = —B Vin -A 
BV in 
=—" -A<V,<A 
A 
= B Vn A 


a) If the input is a Gaussian random variable V with a mean value of V 
and a variance of o^, write a general expression for the probability 
density function of the output. 


b) If A = B = 5 and the input is uniformly distributed from — 2 to 8, 
find the mean value of the output. 


Let the input to the limiter of Problem 2—9.3(b) be 

V(t) = 10 sin (wt + O) 
where O is a random variable that is uniformly distributed from 0 to 27r. 
The output of the limiter is sampled at an arbitrary time f to obtain 
random varaible V,. 
a) Find the probability density function of V,. 
b) Find the mean value of V,. 


c) Find the variance of V,. 


References 
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3—1 Two Random Variables 


All of the discussion so far has concentrated on situations involving a single ran- 
dom variable. This random variable may be, for example, the value of a voltage 
or current at a particular instant of time. It should be apparent, however, that 
saying something about a random voltage or current at only one instant of time is 
not adequate as a means of describing the nature of complete time functions. Such 
time functions, even if of finite duration, have an infinite number of random var- 
iables associated with them. This raises the question, therefore, of how one can 
extend the probabilistic description of a single random variable to include the 
more realistic situation of continuous time functions. The purpose of this section 
is to take the first step of that extension by considering two random variables. It 
might appear that this is an insignificant advance toward the goal of dealing with 
an infinite number of random variables, but it will become apparent later that this 
is really all that is needed, provided that the two random variables are separated 
in time by an arbitrary time interval. That is, if the random variables associated 
with any two instants of time can be described, then all of the information is 
available in order to carry out most of the usual types of systems analysis. Another 
situation that can arise in systems analysis is that in which it is desired to find the 
relation between the input and output of the system, either at the same instant of 
time or at two different time instants. Again, only two random variables are in- 
volved. 

In order to deal with situations involving two random variables, it is necessary 
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to extend the concepts of probability distribution and density functions that were 
discussed in the last chapter. Let the two random variables be designated as X and 
Y and define a joint probability distribution function as 


F(x,y) = Pr[X € x, Y = y] 


Note that this is simply the probability of the event that the random variable X is 
less than or equal to x and that the random variable Y is less than or equal to y. 
As such, it is a straightforward extension of the probability distribution function 
for one random variable. 

The joint probability distribution function has properties that are quite analo- 
gous to those discussed previously for a single variable. These may be summa- 
rized as follows: 


l. 0 = F(x,y) = 1 -~<x< 0 —ocyco 

2. F(—%, y) = F(x, — €) = F(-«, —%) = 0 

3. F(~, ©) = | 

4. F(x,y) is a nondecreasing function as either x or y, or both, increase 
A. A00, y) = By) F(x, ©) = Fx(x) 


In item 5 above, the subscripts on Fy(y) and Fy(x) are introduced to indicate that 
these two distribution functions are not necessarily the same mathematical func- 
tion of their respective arguments. 

As an example of joint probability distribution functions, consider the outcomes 
of tossing two coins. Let X be a random variable associated with the first coin; let 
it have a value of O if a tail occurs and a value of 1 if a head occurs. Similarly 
let Y be associated with the second coin and also have possible values of O and 1. 
The joint distribution function, F(x, y), is shown in Figure 3-1. Note that it sat- 
isfies all of the properties listed above. 

It is also possible to define a joint probability density function by differentiating 
the distribution function. Since there are two independent variables, however, this 
differentiation must be done partially. Thus, 


d F(x.y) 
Jæ: y) = ax ay (3-1) 
and the sequence of differentiation is immaterial. The probability element is 
fix, y) dxdy = Pr[x<Xsx+axyy<Ysy + dy] (3-2) 


The properties of the joint probability density function are quite analogous to 
those of a single random variable and may be summarized as follows: 


l. f(x, ) zO0 —ocx«o —-ocyco 


2. {| ife ») de ay = 1 
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y 


Figure 3-1 A joint probability distribution function. 


3. F(x, y) = MEC v) dv du 
4. f(x) = [pe y) dy fy) = ic y) dx 


x2 [»2 
5. Pri <X Sai <¥ Sy) = | Í f(x, y) dy dx 
xj Ji 


Note that item 2 implies that the volume beneath any joint probability density 
function must be unity. 

As a simple illustration of a joint probability density function, consider a pair 
of random variables having a density function that is constant between x, and x; 
and between y, and y. Thus, 


l X1 0 Xa 
Ug) e a | 
Tea (x2 —x)(y» —y) Un »5» 
= 0 elsewhere (3-3) 


This density function and the corresponding distribution function are shown in 
Figure 3-2. 

A physical situation in which such a probability density function could arise 
might be in connection with the manufacture of rectangular semiconductor sub- 
strates. Each substrate has two dimensions and the values of the two dimensions 
might be random variables that are uniformly distributed between certain limits. 

The joint probability density function can be used to find the expected value of 
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Fix, y) 





Figure 3-2 (a) Joint distribution and (b) density functions. 


functions of two random variables in much the same way as with the single vari- 
able density function. In general, the expected value of any function g(X, Y), can 
be found from 


E|g(X, Y)] = |. |. g(x, y)f(x, y) dx dy (3-4) 


One such expected value that will be considered in great detail in a subsequent 
section arises when g(X, Y) — XY. This expected value is known as the correla- 
tion and is given by 


E[XY] — |. l xyf(x, y) dx dy (3-5) 


As a simple example of the calculation, consider the joint density function shown 
in Figure 3—2(b). Since it is zero everywhere except in the specific region, (3—4) 
may be written as 


x? »2 l 
in A E ee 
[xz] x] yl Vt (x); — x) — y) ? 


i ," Ba 
X1 2 Yı 
i TM 
= n + xi + y2) 


= l Ë 
(x — xyz — »0 L2 
Item 4 in the above list of properties of joint probability density functions in- 
dicates that the marginal probability density functions can be obtained by inte- 
grating the joint density over the other variable. Thus, for the density function in 
Figure 3—2(b), it follows that 
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; Hn | 
fx(x) — | 


— dy 
"Q-x0»-» 








| ta 
7 (x; — xyz — yi) > | TM 
E l 
Xp — X 
and 
fry) = | AN dx 
x1 (x5 — xi (yo — y) 





X3 
| y 
X1 


| 
m RE —— D - 
(x2 — xi Ky? — yi) l 








Exercise 3-1.1 


Consider a rectangular semiconductor substrate with dimensions hav- 
ing mean values of 1 cm and 2 cm. Assume that the actual dimen- 
sions in both directions are uniformly distributed around the means 
with maximum deviations of 0.01 cm. Find: 


a) the probability that both dimensions are larger than their mean 
values by 0.005 cm 


b) the probability that the larger dimension is greater than its 
mean value by 0.005 cm and the smaller dimension is less than 
its mean value by 0.005 cm 


c) the mean value of the area of the substrate. 


Answers: 1/16, 1/16, 2 


Exercise 3-1.2 


Two random variables X and Y have a joint probability density function 
given by 
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fiy) Ae Ur «20, 0 
= 0 x=0,y=0 
Find: 
a) the value of A for which this is a valid joint probability density 


function 
b) the probability that X > 1/2 and Y > 1/4 
c) the expected value of XY. 


Answers: 0.0821, 12, 0.0833 





3—2 Conditional Probability—Revisited 


Now that the concept of joint probability for two random variables has been in- 
troduced, it is possible to extend the previous discussion of conditional probabil- 
ity. The previous definition of the conditional probability density function left 
the given event M somewhat arbitrary—although some specific examples were 
given. In the present discussion, the event M will be related to another random 
variable, Y. 

There are several different ways in which the given event M can be defined in 
terms of Y. For example, M might be the event Y = y and, hence, Pr (M) would 
be just the marginal distribution function of Y—that is, Fy(y). From the basic 
definition of the conditional distribution function given in (2-47) of the previous 
chapter, it would follow that 


Pr [X = x, M] _ Fy) 


= oy) = = 
BOWS = Ea) Fry) 


(3-7) 

Another possible definition of M is that it is the event y, < Y = y;. The definition 
of (2-47) now leads to 

F(x,y) — F(x,yi) 

Prix < Y S y) = Lm (3-8 

j » ia Fy(y2) — Fy(y) 

In both of the above situations, the event M has a nonzero probability—that is, 

Pr (M) > 0. However, the most common form of conditional probabiliity is one 

in which M is the event that Y = y; in almost all these cases Pr (M) = 0, since 

Y is continuously distributed. Since the conditional distribution function is defined 

as a ratio, it usually still exists even in these cases. It can be obtained from (3-8) 
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by letting y; = y and y; = y + Ay and by taking a limit as Ay approaches zero. 
Thus, 


Foy t åy) — Fix o OF (x, y)/dy 
Lg Fy(y + Ay) — Fy(y) dF y(y)/ ay (3-9) 


|J. f(u,y) du 
fr y(y) 
The corresponding conditional density function is 
OF x(x\¥ = y) — flxy) 
Ox fy) 


and this is the form that is most commonly used. By interchanging X and Y it 
follows that 


Fx(x|Y = y) = 


fxedY = y) = (3-10) 


i y) 
(yX =x) = (3-11) 
Because this form of conditional density function is so frequently used, it is 
convenient to adopt a shorter notation. Thus, when there is no danger of ambi- 
guity, the conditional density functions will be written as 


f(x,y) 


3—12 

fel» = fO) (3-12) 
f(x,y) 

folo = pr (3-13) 


From these two equations one can obtain the continuous version of Bayes' theo- 
rem, which was given by (1—21) for the discrete case. Thus, eliminating f(x, y) 
leads directly to 


fev) 
f(y) = F(x) (3-14) 


It is also possible to obtain the total probability from (3—12) or (3—13) by noting 
that 


fx@) = [fay dy = |. robo» dy (3715) 
and 


fro) = |. fo: y)dx — | FO) dx (3-16) 


These equations are the continuous counterpart of (1-20), which applied to the 
discrete case. 
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A point that might be noted in connection with the above results is that the joint 
probability density function completely specifies both marginal density functions 
and both conditional density functions. As an illustration of this, consider a joint 
probability density function of the form 


6 
f(x,y) 5 (l - xy Um aos 1,0283 08 I 


= () elsewhere 


Integrating this function with respect to x alone and with respect to y alone yields 
the two marginal density functions as 


x 


6 
foo = $(1 -£ O<x<!1 


and 


6 
so = $(1 -2) Osysl 


From (3-12) and (3-13) the two conditional density functions may now be written 
as 


D 2 

fey) = —— O<x<1,0<y<1 
] — — 
3 


and 
1-x 
folo = E 0xs1,0sysl 


É ux 
2 





The use of conditional density functions arises in many different situations, but 
one of the most common (and probably the simplest) is that in which some ob- 
served quantity is the sum of two quantities—one of which is usually considered 
to be a signal while the other is considered to be a noise. Suppose, for example, 
that a signal X(t) is perturbed by additive noise N(r) and that the sum of these 
two, Y(t), is the only quantity that can be observed. Hence, at some time instant, 
there are three random variables related by 


Y=X+WN 


and it is desired to find the conditional probability density function of X given the 
observed value of Y—that is, f(y). The reason for being interested in this is that 
the most probable values of X, given the observed value Y, may be a reasonable 
guess, or estimate, of the true value of X when X can only be observed in the 
presence of noise. From Bayes’ theorem this conditional probability is 
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" fold) 
fob) fy) 


But if X is given, as implied by f(y|x), then the only randomness about Y is the 
noise N, and it is assumed that its density function, fy(n), is known. Thus, since 
N — Y — X, and X is given, 


fol) = fun = y — x) = fly — x) 
The desired conditional probability density, f(xy), can now be written as 
_ IFO — Axx) _ uly — Axa) 


x (3-17) 
ae | NO- 900 a 


fe» 


in which the integral in the denominator is obtained from (3-16). Thus, if the a 
priori density function of the signal, fy(x), and the noise density function, fy(n), 
are known, it becomes possible to determine the conditional density function, 
f(xly). When some particular value of Y is observed, say y,, then the value of x 
for which f(xly,) is a maximum is a good estimate for the true value of X. 

As a specific example of the above application of conditional probability, sup- 
pose that the signal random variable, X, has an exponential density function so 
that 


b exp ( — bx) x20 


= 0 x«0 


fx) 


Such a density function might arise, for example, as a signal from a space probe 
in which the time intervals between counts of high-energy particles are converted 
to voltage amplitudes for purposes of transmission back to earth. The noise that 
is added to this signal is assumed to be Gaussian, with zero mean, so that its 
density function is 


l n? 
In(n) = Vs o» exp (-25) 


The marginal density function of Y, which appears in the denominator of (3— 
17), now becomes 


" d (y — xy 
=| ——— > — bx) dx 
fro) f Vin m exp | 20 v. | exp ( x) | ii 
b cj (2 - Hg 
- Zep (=b + 2 f + erf ow 


"The error function is related to the normal probability distribution function and is defined as 





erf(z) = = | e du = 20(\/2 2) — 1 
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It should be noted, however, that if one is interested only in locating the maxi- 
mum of f(xly), it is not necessary to evaluate fy(y) since it is not a function of x. 
Hence, for a given Y, fy(y) is simply a constant. 

The desired conditional density function can now be written, from (3—17), as 


b (y—ay 
= — <2 -b L-0 
fol») Vir ow) P l 2o | exp(—bx) Xx 
= 0 x <0 
This may also be written as 
b l 
JUI) € ———— 5 [-z5w — 2(y— bc: DET 
| V 2r away) P lny " (3-19) 

= 0 recu 


and this is sketched in Figure 3—3 for two different values of y. 


It was noted earlier that when a particular value of Y is observed, a reasonable 
estimate for the true value of X is that value of x which maximizes f(x|y). Since 
the conditional density function is a maximum (with respect to x) when the ex- 
ponent is a minimum, it follows that this value of x can be determined by equating 
the derivative of the exponent to zero. Thus 


2x — X(y — bow) = 0 
or 
x = y — boy (3-20) 


is the location of the maximum, provided that y — boy > 0. Otherwise, there is 
no point of zero slope on f(x|y) and the largest value occurs at x — 0. Suppose, 
therefore, that the value Y = y; is observed. Then, if y, > boy’, the appropriate 
estimate for X is X = y, — bøy . On the other hand, if y, < boy’, the appropri- 


f(x|y) 





(a) 


Figure 3-3 The conditional density function, f(x|y)&-(a) case for y < bo,” and (b) 
case for y > boy’. 
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ate estimate for X is X = 0. Note that as the noise gets smaller (oy — 0), the 
estimate of X approaches the observed value y}. 





Exercise 3-2.1 


Two random variables, X and Y, have a joint probability density func- 
tion of the form 
f(x, y) = k(x + y) 0zxz1,0-zyszxl 
= 0 elsewhere 
Find: 


a) the value of k for which this is a valid joint probability density 
function 


b) the conditional probability that X is greater than 1/2 given that 
Y = 1/2 


c) the conditional probability that Y is less than, or equal to, 1/2 
given that X is 1/2. 
Answers: 3/8, 5/8, 1 


Exercise 3—2.2 


A random signal X is uniformly distributed between 6 and 10 V. It is 
observed in the presence of Gaussian noise N having zero mean and 
a standard deviation of 2 V. 


a) If the observed value of signal plus noise, (X + N), is 4, find 
the best estimate of the signal amplitude. 


b) Repeat (a) if the observed value of signal plus noise is 8. 
c) Repeat (a) if the observed value of signal plus noise is 12. 


Answers: 6,8, 10 





3—3 Statistical Independence 


The concept of statistical independence was introduced earlier in connection with 
discrete events, but is equally important in the continuous case. Random variables 
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that arise from different physical sources are almost always statistically indepen- 
dent. For example, the random thermal voltage generated by one resistor in a 
circuit is in no way related to the thermal voltage generated by another resistor. 
Statistical independence may also exist when the random variables come from the 
same source but are defined at greatly different times. For example, the thermal 
voltage generated in a resistor tomorrow almost certainly does not depend upon 
the voltage today. When two random variables are statistically independent, a 
knowledge of one random variable gives no information about the value of the 
other. 

The joint probability density function for statistically independent random vari- 
ables can always be factored into the two marginal density functions. Thus, the 
relationship 


foy) = fxoofyCy) (3-21) 


can be used as a definition for statistical independence, since it can be shown that 
this factorization is both a necessary and sufficient condition. As an example, this 
condition is satisfied by the joint density function given in (3-3). Hence, these 
two random variables are statistically independent. 

One of the consequences of statistical independence concerns the correlation 
defined by (3-5). Because the joint density function is factorable, (3—5) can be 
written as 


E[XY] 


[. Xfx(x) dx i yfy(y) dy 
E[X]E[Y] = XY 


(3-22) 


Hence, the expected value of the product of two statistically independent random 
variables is simply the product of their mean values. The result will be zero, of 
course, if either random variable has zero mean. 

Another consequence of statistical independence is that conditional probability 
density functions become marginal density functions. For example, from (3-12) 


f(x,y) 
fy) 


but if X and Y are statistically independent the joint density function is factorable 
and this becomes 


fel» = 


fob) = » ETA 


Similarly, 
fay) — KO) 


IO = RO 


= fry) 
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It may be noted that the random variables described by the joint probability 
density function of Exercise 3—1.2 are statistically independent since the joint 
density function can be factored into the product of a function of x only and a 
furiction of y only. However, the random variables defined by the joint probability 
density function of Exercise 3—2.1 are not statistically independent since this den- 
sity function cannot be factored in this manner. 





Exercise 3—3.1 


Two random variables, X and Y, have a joint probability density func- 
tion of the form 
f(x,y) = k(xy + 2x + y + a) i:sx m bis ya) 
= 0 elsewhere 
Find: 


a) the values of k and a for which the random variables X and Y 
are statistically independent 


b) the expected value of XY. 


Answers: 4/15, 10/9, 2 


Exercise 3—3.2 
Two random variables, X and Y, have Gaussian probability density 
functions with means of 1 and 2 respectively and variances of 1 and 
4 respectively. Find the probability that XY > O. 


Answer: 0.7078 





3—4 Correlation Between Random Variables 


As noted above, one of the important applications of joint probability. density 
functions is that of specifying the correlation of two random variables; that is, 
whether one random variable depends in any way upon another random variable. 
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If two random variables X and Y have possible values x and y, then the expected 
value of their product is known as the correlation, defined in (3—5) as 


E|XY] = [. [. xyf(x,y) dx dy = XY (3-5) 


If both of these random variables have nonzero means, then it is frequently more 
convenient to find the correlation with the mean values subtracted out. Thus, 


E(X -Xy-Y)]-(«x-x*Y-» (3-23) 
- [. fa - X)y — YyfGy) dx dy 


This is known as the covariance, by analogy to the variance of a single random 
variable. 

If it is desired to express the degree to which two random variables are corre- 
lated without regard to the magnitude of either one, then the correlation coefficient 
or normalized covariance is the appropriate quantity. The correlation coefficient, 
which is denoted by p, is defined as 


"AP aot LEA 


Note that each random variable has its mean pM out and is divided by 
its standard deviation. The resulting random variable is often called the standard- 
ized variable and is one with zero mean and unit variance. 

An alternative, and sometimes simpler, expression for the correlation coefficient 
can be obtained by multiplying out the terms in equation (3-24). This yields 


- Í [ xy — Xy — Yx + XY 


U xXO y 














-—— y)dxdy (3-24) 


flx,y)dx dy 


Carrying out the integration leads to 


_ E(XY) - XY 


OyO y 


(3-25) 


In order to investigate some of the properties of p, define the standardized 
variables £ and 7 as 








X-X Y-Y 
p= n = 

Ox Gy 
£-0 y -0 
of z1 af = 1 


Then, 
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Now look at 


E((£ + my] 


E? + 2én + 7] = 1+ 29 +1 
2(1 + p) 


Since (£ + m)! is always positive, its expected value must also be positive, so 
that 


21 + p)=0 
Hence, p can never have a magnitude greater than one and thus 
-lsps!l 


If X and Y are statistically independent, then 


p*Hi-i(97-—1 
since both £ and 7) are zero mean. Thus, the correlation coefficient for statistically 
independent random variables is always zero. The converse is not necessarily true, 
however. A correlation coefficient of zero does not automatically mean that X and 
Y are statistically independent unless they are Gaussian, as will be seen. 
In order to illustrate the above properties, consider two random variables for 
which the joint probability density function is 


foy =xt+y O's: 2:1, O=ys!1 
= 0 elsewhere 


From Property 4 pertaining to joint probability density functions, it is straightfor- 
ward to obtain the marginal density functions as 


l 
f = | e*»4 rz 0zxzxl 


and 


l 
»t5 Os y=] 


fy) = Í (x + y)dx 
from which the mean values of X and Y can be obtained immediately as 
| 
- l 7 
X= Í t+-|a=— 
o(: ;) 12 
with an identical value for E[Y]. The variance of X is readily obtained from 
iaf unii [am D oe 
a Bs 12 2 144 


Again there is an identical value for o^. Also the expected value of XY is given 
by 
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l 
E[XY] = | | xy(x + y)dx dy " 


Hence, from (3-25) the correlation coefficient becomes 


_ E[XY] — X Y 1/3 — (7/12) l 


Although the correlation coefficient can be defined for any pair of random var- 
iables, it is particularly useful for random variables that are individually and 
jointly Gaussian. In these cases, the joint probability density function can be writ- 
ten as 


l 


foy) = ——— 
2vTOxy0y V1 — p? (3-26) 
-1 |e&-3' (y -Y' 2«- X0 -Yp 
X1 — p°) ox oy OXO y 


Note that when p = 0, this reduces to 


o .1|e-X" 0- 
fay) = 2TOXO0y v | 2 l Tx » oy || 


= ffr) 


which is the form for statistically independent Gaussian random variables. Hence, 
p = 0 does imply statistical independence in the Gaussian case. 

It is also of interest to use the correlation coefficient to express some results for 
general random variables. For example, from the definitions of the standardized 
variables it follows that 


X = oxf + X and Y = om + Y 





and, hence 


XY = Ef(oxé + Xom + Y)] = Eloxorën + Xom + Yoxé + XY] (3-27) 
poxoy + XY 


As a further example, consider 
E(X + Y] = EIX? + 2XY + YJ = X +2XY+ř 
Ox. + (Xy + 2poyoy, + 2xY + oy + (vy. 
ox + oy + 2paxay + (X + Yy 
Since the last term is just the square of the mean of (X + Y), it follows that the 
variance of (X + Y) is 


[oxen]? = ox’ + oy? + 2poxay (3-28) 
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Note that when random variables are uncorrelated (p = 0), the variance of sum 
or difference is the sum of the variances. 





Exercise 3—4.1 


The result expressed in equation (3—28) may be used to measure the 
correlation coefficient of two random signals. To illustrate this, assume 
that we measure the mean and variance of a random signal to be 5 
and 8 respectively, and the mean and variance of another signal to be 
3 and 10 respectively. We then combine the two signals and measure 
the mean-square value of the sum as 75. Find: 


a) the variance of the sum 
b) the correlation of the signals 


c) the correlation coefficient. 


Answers: -—0.391, 11, 11.5 


Exercise 3—4.2 


A random variable X has a variance of 9 and a statistically indepen- 
dent random variable Y has a variance of 16. Their sum is another 
random variable Z = X + Y. Without assuming that either random 
variable has zero mean, find: 


a) the correlation coefficient for X and Z 
b) the correlation coefficient for Y and Z 


c) the variance of Z. 


Answers: 25, 0.6, 0.8 





3—5 Density Function of the Sum of Two 
Random Variables 


The above example illustrates that the mean and variance associated with the sum 
(or difference) of two random variables can be determined from a knowledge of 
the individual means and variances and the correlation coefficient without any 
regard to the probability density functions of the random variables. A more diffi- 
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cult question, however, pertains to the probability density function of the sum of 
two random variables. The only situation of this sort that is considered here is the 
one in which the two random variables are statistically independent. The more 
general case is beyond the scope of the present discussion. 

Let X and Y be statistically independent random variables with density functions 
of fx(x) and fy(y), and let the sum be 


Z£Z—X--Y 


It is desired to obtain the probability density function of Z, f7(z). The situation is 
best illustrated graphically as shown in Figure 3—4. The probability distribution 
function for Z is just 


F,(z) = Pr(Z <2) = Pr(X + Y x z) 


and can be obtained by integrating the joint density function, f(x,y), over the 
region below the line, x + y = z. For every fixed y, x must be such that ~% < 
x<z — y. Thus, 


Fz(2) = |. |. f(x,y) dx dy (3-29) 


For the special case in which X and Y are statistically independent, the joint den- 
sity function is factorable and (3—29) can be written as 


Fz(z) = [. [. fx ofy(y) dx dy 


= [fo [. fx x) dx dy 


The probability density function of Z is obtained by differentiating Fz(z) with 
respect to z. Hence 


dFz(z) u 
dz 


fz(z) = | ose — y) dy (3-30) 





Figure 3—4 Showing the region for X + Y = Z =z. 
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(b) 


Figure 3-5 Density functions for two random variables. 


since z appears only in the upper limit of the second integral. Thus, the probability 
density function of Z is simply the convolution of the density functions of X 
and Y. 

It should also be clear that (3—29) could have been written equally well as 


Fz(2) = |. |. f(x,y) dy dx 


and the same procedure would lead to 


f2(z) = NT — x) dx (3-31) 


Hence, just as in the case of system analysis, there are two equivalent forms for 
the convolution integral. 

As a simple example of this procedure, consider the two density functions 
shown in Figure 3—5. These may be expressed analytically as 


fx» 


l 0zx-zxl 


= 0 elsewhere 
and 
fry) =e” yzo 
= 0 y<0 
The convolution must be carried out in two parts, depending on whether z is 
greater or less than one. The appropriate diagrams, based on (3—30), are sketched 
in Figure 3—6. When 0 < z = 1, the convolution integral becomes 
fz(z) = f (De ""dt-1-—e7 Ücz*]l 
When z > 1, the integral is 


l 
fz) = f dje ^9 de = fe = e” ]pes« 6 
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f(x) 





(a) (b) (c) 


Figure 3-6 Convolution of density functions: (a) 0 < z = 1, (b) 1 < z < œ, and 
(c) fz(z). 


When z < 0, f(z) = 0 since both fy(x) = 0, x < 0 and f(y) = 0, y < 0. The 
resulting density function is sketched in Figure 3—6(c).. 

It is straightforward to extend the above result to the difference of two random 
variables. In this case let 


Z-X-Yr 
All that is necessary in this case is to replace y by — y in equation (3—30). Thus, 
fz(z) = [ one + y)dy (3-32) 
There is also an alternative expression analogous to equation (3—31). This is 
fz(z) = [oe — z) dx (3-33) 


It is also of interest to consider the case of the sum of two independent Gaussian 
random variables. Thus let 


: _ 
fx) = VEG exp Dem 


and 





a d yy 
fv) = vn Ty exp l 20y | 
Then if Z = X + Y, the density function for z is [based on (3—31)] 
f ex z2- ex rd cc dx 
2TOy0y J-> P 20x? P 2o y 


It is left as an exercise for the student to verify that the result of this integration 
is 





fz(z) = 
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-[-(x-4Yp] 
-[z — (X L] — 


| 
(2) = ——— ex —; ; 
jz V rlo E ay’) d | 2(my* + Oy) 


This result clearly indicates that the sum of two independent Gaussian random 
variables is still Gaussian with a mean that is the sum of the means and a variance 
that is the sum of the variances. It should also be apparent that by adding more 
random variables, the sum is still Gaussian. Thus, the sum of any number of 
independent Gaussian random variables is still Gaussian. Density functions that 
exhibit this property are said to be reproducible; the Gaussian case is one of a 
very limited class of density functions that are reproducible. Although it will not 
be proven here, it can likewise be shown that the sum of correlated Gaussian 
random variables is also Gaussian with a mean that is the sum of the means and 
a variance that can be obtained from (3—28). 

The fact that sums (and differences) of Gaussian random variables are still 
Gaussian is a very important one in the analysis of linear systems. It can also be 
shown that derivatives and integrals of time functions that have a Gaussian distri- 
bution are still Gaussian. Thus, one can carry out the analysis of linear systems 
for Gaussian inputs with the assurance that signals everywhere in the system are 
Gaussian. This is analagous to the use of sinuosidal functions for carrying out 
steady state system analysis in which signals everywhere in the system are still 
sinusoids at the same frequency. 





Exercise 3—5.1 


Let a random variable X have a probability density function of 
f(x) = Se^ xao 


and a statistically independent random variable Y have a probability 
density function of 


fly) =2e°” y=0 
For the random variable Z = X + Y find: 
a) fz(0) 
b) the value of z for which fz(z) is a maximum 


c) the probability that Z is greater than 1.0. 


Answers: 0, 0.221, 0.305 
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Exercise 3—5.2 


The resistance values of a supply of resistors are independent random 
variables and are uniformly distributed between 100 and 120 ohms. If 
two resistors are selected at random and connected in series, find: 


a) the most probable value of resistance for the series combina- 
tion 

b) the largest value of resistance for the series combination 

c) the probability that the series combination will have a resis- 
tance value greater than 220 ohms. 


Answers: 220, 0.5, 240 





3—6 The Characteristic Function 


It is shown in the previous section that the probability density function of the sum 
of two independent random variables can be obtained by convolving the individual 
density functions. When more than two random variables are summed, the result- 
ing density function can obviously be obtained by repeating the convolution until 
every random variable has been taken into account. Since this is a lengthy and 
tedious procedure, it is natural to inquire if there is some easier way. 

When convolution arises in system and circuit analysis, it is well known that 
transform methods can be used to simplify the computation since convolution then 
becomes a simple multiplication of the transforms. Repeated convolution is ac- 
complished by multiplying more transforms together. Thus, it seems reasonable 
to try to use transform methods when dealing with density functions. This section 
discusses how to do it. 

The characteristic function of a random variable X is defined to be 


plu) = Ele ^^] (3-35) 

and this expected value can be obtained from 
plu) = [ fixe“ dx (3-36) 
The right side of (3—36) is (except for a minus sign in the exponent) the Fourier 


transform of the density function f(x). The difference in sign for the characteristic 
function is traditional rather than fundamental, and makes no essential difference 


132 CHAPTER 3 SEVERAL RANDOM VARIABLES 


in the application or properties of the transform. By analogy to the inverse Fourier 
transform, the density function can be obtained from 


1 (^ 
fe = 5 |. $(u)e ^" du (3-37) 


In order to illustrate one application of characteristic functions, consider once 
again the problem of finding the probability density function of the sum of two 
independent random variables X and Y, where Z = X + Y. The characteristic 
functions for these random variables are 


x(u) = | fee ux dx 
and 
$y(u) = ] oe um dy 


Since convolution corresponds to multiplication of transforms (characteristic func- 
tions) it follows that the characteristic function of Z is 


$z(u) = dy(u)dy(u) 


The resulting density function for Z becomes 
l 2 y Juz 
fzG) = 5 i dx(wdy(wye ^* du (3-38) 
This technique can be illustrated by reworking the example of the previous 


section, in which X was uniformly distributed and Y exponentially distributed. 
Since 


| 
[a 


O=x= 1 


f~) = 


0 elsewhere 


the characteristic function is 
l 





I " o jux 
x(u) = Í (je dx = ^ ; 
B e” —] 
Ju 
Likewise, 
froy=e*  yzO 
= 0 y«0 


so that 


THE CHARACTERISTIC FUNCTION 133 


ld (— 1 ju) - 
dy(u) = | e "e! dy — EM n oe 
o (-1 + juo 1 — ju 


Hence, the characteristic function of Z is 


e^" —] 
$z(u) = dx(u)dy(u) = — —— — 
jul — ju) 
and the corresponding density function is 
Lf^e^"-—-1 , 
e um es 
J28) = 3t] jul — jw 
l o0 e ^72 l eo e 7" 


ze": eee HR, a ntt "mua 
2m J-- ju(l — ju) * Am J- ju(l — ju) e 


=l- e when 0«z€«] 


= (e— De * when |] «z« o 


z 


The integration can be carried out by standard inverse Fourier transform meth- 
ods or by the use of tables. 

Another application of the characteristic function is to find the moments of a 
random variable. Note that if ġ(u) is differentiated, the result is 


dó(u) = Í fwe jux dx 
du -* 


For u = 0, the derivative becomes 


dou) 
du 





=j [. xf(x) dx = jX (3-39) 
0 


Higher order derivatives introduce higher powers of x into the integrand so that 
the general nth moment can be expressed as 
= 1 | d’d(u 
F = EX] = + EJ (9-40) 
j du [|u-o 
If the characteristic function is available, this may be much easier than carrying 
out the required integrations of the direct approach. 
There are some fairly obvious extensions of the above results. For example, 
(3-38) can be extended to an arbitrary number of independent random variables. 


If Xj, X2, . . . , X, are independent and have characteristic functions of ,(u), 
ó»(u), * "à ‘es ou), and if 
Y=X,+X,+...+X, 


then Y has a characteristic function of 
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bylu) = $i()0»;(u) * * * b,(u) 


and a density function of 
A us — 
AO) == [. $i (u)o) * * * ó,Q)e "7 du (3-41) 
The characteristic function can also be extended to cases in which random var- 


iables are not independent. For example, if X and Y have a joint density function 
of f(x, y), then they have a joint characteristic function of 


dy y(u, v) x Ele A - [ [ f(x, y)e jGux + vy) dx dy (3-42) 


The corresponding inversion relation is 


l 
(27) 





f(x, y) = |. | |. Oxy(u, ve eT) du dy (3-43) 


The joint characteristic function can be used to find the correlation between the 
random variables. Thus, for example, 


— 9? (u, v 
E[XY] = XY = pU - J (3-44) 
Ou OV u=v=0 
More generally, 
| —; | [o'**S.(u, v) 
E[X'Y'] = X'Y* = —;|——— —— 
[ | : dt k | au av — (3-45) 


The results given in (3—40), (3—43), and (3—45) are particularly useful in the case 
of Gaussian random variables since the necessary integrations and differentiations 
can always be carried out. One of the valuable properties of Gaussian random 
variables is that moments and correlations of all orders can be obtained from only 
a knowledge of the first two moments and the correlation coefficient. 





Exercise 3—6.1 


For the two random variables in Exercise 3—5.1, find the probability 
density function of Z = X + Y by using the characteristic function. 


Answer: Same as found in Exercise 3—5.1. 
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Exercise 3—6.1 


A random variable X has a probability density function of the form 
f(x) = 26 "i -0 « xy« o 

Using the characteristic function, find the 1st and 2nd moments of this 

random variable. 


Answers: 0, 1/8 





PROBLEMS 


3-1.1 Two random variables have a joint probability distribution function de- 


fined by 
F(x, y) = 0 x<0,y<0 
= xy Üsxselüssysx!l 
= | x2ly^l 


a) Sketch this distribution function. 
b) Find the joint probability density function and sketch it. 


3 | 
c) Find the joint probability of the event X = " and Y — 4 


3-1.2 Two random variables, X and Y, have a joint probability density function 
given by 
ity = ky  DmzxshÓósyszl 
= 0 elsewhere 


a) Determine the value of k that makes this a valid probability density 
function. 

b) Determine the joint probability distribution function F(x, y). 

l 


l 
c) Find the joint probability of the event X = 2 and Y > > 


d) Find the marginal density function, fy(x). 
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3-1.3 


3-1.4 


3-2.1 


3-2.2 


3-2.3 
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a) For the random variables of Problem 3-1.1 find E[XY]. 
b) For the random variables of Problem 3-1.2 find E[XY]. 


Let X be the outcome from rolling one die and Y the outcome from rolling 
a second die. 

a) Find the joint probability of the event X = 3 and Y > 3. 

b) Find E[XY]. 


s quu mit 
c) Find [| 


A signal X has a Rayleigh density function and a mean value of 10 and 
is added to noise, N, that is uniformly distributed with a mean value of 
zero and a variance of 12. X and N are statistically independent and can 
be observed only as Y = X + N. 


a) Find, sketch, and label the conditional probability density function, 
f(xly), as a function of x for y = 0, 6, and 12. 


b) If an observation yields a value of y = 12, what is the best estimate 
of the true value of X? 


For the joint probability density function of Problem 3-1.2, find: 


a) The conditional probability density function f(x|y). 


b) The conditional probability density function f( y|x). 


A dc signal having a uniform distribution over the range from — 5 V to 
+5 V is measured in the presence of an independent noise voltage having 
a Gaussian distribution with zero mean and a variance of 2 V^. 


a) Find, sketch, and label the conditional probability density function of 
the signal given the value of the measurement. 


b) Find the best estimate of the signal voltage if the measurement 
is 6 V. 


c) Find the best estimate of the noise voltage if the measurement 
is 7 V. 
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3-2.4 A random signal X can be observed only in the presence of independent 


3-3.1 


3-3.2 


3-4.1 


3—4.2 


additive noise N. The observed quantity is Y = X + N. The joint prob- 
ability density function of X and Y is 


f(x,y) = K exp[—Q? + y! + 4Axy)) all x and y 
a) Find a general expression for the best estimate of X as function of 
the observation Y = y. 


b) If the observed value of Y is y = 3, find the best estimate of X. 


For each of the following joint probability density functions state whether 
the random variables are statistically independent and find E[XY]. 


a) Jy = © Vay LF 1lsys2 
y 
= 0 elsewhere 
b) foy) = k^ + y) 0zsysl0sxym] 


= 0 elsewhere 
c) fixy) = k(xy + 2x + 3y + 6) UsSxel,Useys]l 
0 elsewhere 


Let X and Y be statistically independent random variables. Let W = g(X) 
and V — A(Y) be any transformations with continuous derivatives on X 
and Y. Show that W and V are also statistically independent random var- 
iables. 


Two random variables have zero mean and variances of 16 and 36. Their 
correlation coefficient is 0.5. 

a) Find the variance of their sum. 

b) Find the variance of their difference. 


c) Repeat (a) and (b) if the correlation coefficient is — 0.5. 


Two statistically independent random variables, X and Y, have variances 
of oy” = 9 and ay? = 25. Two new random variables are defined by 


U = 3X + 4Y 
V = 5X — 2Y 
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3-4.3 


3-4.4 


3-5.1 


3-5.2 
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a) Find the variances of U and V. 


b) Find the correlation coefficient of U and V. 


Let X be a zero-mean random variable having a variance of 9 and let Y 
be another zero-mean random variable. The sum of X and Y has a vari- 
ance of 29 and the difference of X and Y has a variance of 21. 


a) Find the variance of Y. 


b) Find the correlation coefficient of X and Y. 


c) Find the variance of U = 3X — 5Y. 


Three zero mean, unit variance random variables X, Y, and Z are added 
to form a new random variable, W = X + Y + Z. Random variables X 
and Y are uncorrelated, X and Z have a correlation coefficient of 1/2, and 
Y and Z have a correlation coefficient of — 1/2. 


a) Find the variance of W. 


b) Find the correlation coefficient between W and X. 


c) Find the correlation coefficient between W and the sum of Y and Z. 


A random variable X has a probability density function of 

fx) 2x Des] 

= 0 elsewhere 
and an independent random variable Y is uniformly distributed between 
— 1.0 and 1.0. 
a) Find the probability density function of the random variable Z = X 
+ 2Y. 

b) Find the probability that 0 < Z = I. 


A commuter attempts to catch the 8:00 am train every morning although 
his arrival time at the station is a random variable that is uniformly dis- 
tributed between 7:55 am and 8:05 am. The train's departure time from 
the station is also a random variable that is uniformly distributed between 
8:00 am and 8:10 am. 


3-5.3 


3-5.4 


3-6.1 
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a) Find the probability density function of the time interval between the 
commuter’s arrival at station and the train's departure time. 


b) Find the probability that the commuter will catch the train. 


c) If the commuter gets delayed 3 minutes by a traffic jam, find the 
probability that the train will still be at the station. 


A sinusoidal signal has the form 
X(t) = cos(100t + O) 


where O is a random variable that is uniformly distributed between 0 and 
27. Another sinusoidal signal has the form 


Y(t) = cos (100t + v) 


where y/ is independent of O and is also uniformly distributed between 0 
and 27r. The sum of these two sinusoids, Z(t) = X(t) + Y(t) can be 
expressed in terms of its magnitude and phase as 


Z(t) = A cos (100r + 4) 


a) Find the probability that A — 1. 


b) Find the probability that A — 

Many communication systems connecting computers employ a technique 
known as ''packet transmission." In this type of system, a collection of 
binary digits (perhaps 1000 of them) are grouped together and transmitted 
as a "packet." The time interval between packets is a random variable 
that is usually assumed to be exponentially distributed with a mean value 
that is the reciprocal of the average number of packets per second that are 
transmitted. Under some conditions it is necessary for a user to delay 
transmission of a packet by a random amount that is uniformly distributed 
between O and T. If a user is generating 100 packets per second, and his 
maximum delay time, 7, is 1 ms, find: 


a) The probability density function of the time interval between packets. 


b) The mean value of the time interval between packets. 


A random variable X has a probability density function of the form 


fx) = e "u(x) 


140 CHAPTER 3 SEVERAL RANDOM VARIABLES 


and an independent random variable Y has a probability density function 


of 
fro) = 3e u(y) 


Using characteristic functions, find the probability density function of 
Z=X + FY. 


3—6.2 a) Find the characteristic function of a Gaussian random variable with 


zero mean and variance o^. 


b) Using the characteristic function, verify the result in Section 2—5 for 
the nth central moment of a Gaussian random variable. 


3—6.3 The characteristic function of the Bernoulli distribution is 
pu) = 1 — p + pe^ 
where p is the probability that the event of interest will occur at any one 
trial. Find: 
a) The mean value of the Bernoulli random variable. 
b) The mean-square value of the random variable. 


c) The 3rd central moment of the random variable. 
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Elements 
of Statistics 


4—1 Introduction 


Now that we have completed an introductory study of probability and random 
variables, it is desirable to turn our attention to some of the important engineering 
applications of these concepts. One such application is in the field of statistics. 
Although our major objective in this text is to apply probabilistic concepts to the 
study of signals and systems, the field of statistics is of such importance to the 
engineer that it would not be appropriate to proceed without a brief discussion of 
the subject. Therefore, the objective of this chapter is to present a very brief 
introduction to some of the elementary concepts of statistics before turning all of 
our attention to signals and systems. It may be noted, however, that this material 
may be omitted without jeopardizing the understanding of subsequent chapters if 
time does not permit its inclusion. 

Probability and statistics are often considered to be one and the same subject 
and they are often linked together in courses and textbooks. However, they are 
really two different areas of study even though statistics relies heavily upon prob- 
abilistic concepts. In fact, the usual definition of statistics makes no reference at 
all to probability. Instead, it defines statistics as the science of assembling, clas- 
sifying, tabulating, and analyzing data or facts. In apparent agreement with this 
definition, a popular undergraduate textbook on statistics does not even discuss 
probability until the eighth chapter! 

There are two general branches of statistics that are frequently designated as 
descriptive statistics and inductive statistics or statistical inference. Descriptive 
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statistics involves collecting, grouping, and presenting data in a way that can be 
easily understood or assimilated. Statistical inference, on the other hand, uses the 
data to draw conclusions about, or estimate parameters of, the environment from 
which the data came. 

The field of statistics is very large and includes a great many areas of specialty. 
For our purposes, however, it is convenient to classify them into five theoretical 
areas. They are: 


a) Sampling theory, which deals with problems associated with selecting sam- 
ples from some collection of data that is too large to be examined com- 
pletely. 

b) Estimation theory, which is concerned with making some estimate or pre- 
diction based on the data that is available. 

c) Hypothesis testing, which attempts to decide which of two or more hy- 
potheses about the data are true. 

d) Curve fitting and regression, which attempts to find mathematical expres- 
sions that best represent the data. 

e) Analysis of variance, which attempts to assess the significance of variations 
in the data and the relation of these variations to the physical situations 
from which the data arose. 


One cannot hope to cover all of these topics in one brief chapter, so we will limit 
our attention to some simple concepts associated with sampling theory, a brief 
exposure to hypothesis testing, and a short discussion and example of linear 
regression. 


4—2 Sampling Theory—The Sample Mean 


A problem that often arises in connection with quality control of manufactured 
items is that of determining whether the items are meeting the desired quality 
standards without actually testing all of them. Usually, the number of items being 
manufactured is so large that it would be impractical to test every one. The alter- 
native is to test only a few items and hope that these few are representative of all 
the items. Similar problems arise in connection with taking polls of public opin- 
ion, in determining the popularity of certain television programs, or in determin- 
ing any sort of average about the general population. 

Problems of the type listed above are solved by sampling the collection of items 
or facts that are being considered. A sufficient number of samples must be taken 
in order to obtain an answer in which one has reasonable confidence. Clearly, one 
would not predict the outcome of a presidential election by taking the result of 
asking the first person met on the street. Nor would one claim that one million 
transistors are all good or all bad on the basis of testing only one of them. On the 
other hand, it may be very expensive and time consuming to take samples; thus, 
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it is important not to take more samples than are actually required. One of the 
purposes of this section is to determine how many samples are required for a given 
degree of confidence in the result. 

It is necessary to introduce some terminology in connection with sampling. The 
collection of data that is being studied is known as the population. For example, 
if a production line is set up to make a particular device, then all of these devices 
that are produced in a given run become the population. If one is concerned with 
predicting the outcome of an election, then the population is all persons voting in 
that election. The number of items or pieces of data that make up the population 
is designated as N. This is said to be the size of the population. If N is not a very 
large number, then its value may be significant. On the other hand, if N is very 
large it is often convenient to assume that it is infinity. The calculations for infinite 
populations are somewhat easier to carry out than for finite values of N, and, as 
will be seen, for very large N it makes very little difference whether the actual 
value of N is used or if one assumes N is infinite. 

A sample, or more precisely a random sample, is simply part of the population 
that has been selected at random. As mentioned in Chapter 1, the term “‘selected 
at random'' implies that all members of the population are equally likely to be 
selected. This is a very important consideration and one must often go to consid- 
erable difficulty to ensure that all members of the population do have an equal 
probability of being selected. The number of items or pieces of data in the sample 
is denoted as n and is called the size of the sample. 

There are a number of calculations that can be made with the members of the 
sample and one of the most important of these is the sample mean. For most 
engineering purposes, every item in the sample can be assigned a numerical value. 
Obviously, there are other types of samples, such as might arise in public opinion 
sampling, where numerical values cannot be assigned; we are not going to be 
concerned with such situations. For our purposes, let us assume that we have a 
sample of size n drawn from a population of size N, and that each element of the 
sample has a numerical value that is designated by xj, x2, . . . , x,. For example, 
if we are testing bipolar transistors these x-values might be the dc current gain, 8. 
We also assume that we have a truly random sample so that the elements we have 
are truly representative of the entire population. The sample mean is simply the 
average of the numerical values that make up the sample. Hopefully, this average 
value will be close to the average value of the population from which the sample 
is drawn. How close it might be is one of the problems addressed here. 

When one has a particular sample, the sample mean is denoted by 


| H 
xX--»x (4-1) 
n i=] 
where the x; are the particular values in the sample. More generally, however, we 


are interested in describing the statistical properties of arbitrary random samples 
rather than those of any particular sample. In this case, the sample mean becomes 
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a random variable, as do the members of the sample. Thus, it is appropriate to 
denote the sample mean as 


<= 


= |= 


2 Xi (4-2) 


where the X; are random variables from the population and each is assumed to 
have the population probability density function f(x). Note that the notation here 
is consistent with that used previously in connection with random variables; capi- 
tal letters being used for random variables and small letters for possible values of 
the random variable. This notation is used throughout this chapter and it is impor- 
tant to distinguish general results, which deal with random variables, from specific 
cases in which particular values are used. 

The true mean value of the population from which the sample came is denoted 
by X. Hopefully, the sample mean will be close to this value. Since the sample 
mean, in the general case, is a random variable, it also has a mean value. Thus, 


E[X] = el: X x| 


It is clear from this result that the mean value of the sample mean is equal to the 
true mean value of the population. It is said, therefore, that the sample mean is 
an unbiased estimate of the population mean. The term ''unbiased estimate’’ is 
one that arises often in the study of statistics and it simply implies that the mean 
value of the estimate of any parameter is the same as the true mean value of the 
parameter. 

Although it is certainly desirable for the sample mean to be an unbiased esti- 
mate of the true mean, this is not sufficient to indicate whether the sample mean 
is a good estimator of the true population mean. Since the sample mean is itself a 
random variable, it will have a value that fluctuates around the true population 
mean as different samples are drawn. Therefore, it is desirable to know something 
about the magnitude of this fluctuation; that is, to determine the variance of the 
sample mean. This is done first for the case in which the population size is very 
much greater than the sample size; that is, N>n. In such cases, it is reasonable to 
assume that the characteristics of the population do not change as the sample is 
drawn. It is also equivalent to assuming that N = ^. 

In order to calculate the variance, we look at the difference between the mean- 
square value of X and the square of the mean value of X, which, as we have just 
seen, is the true mean of the population, X. Thus, 
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(4-3) 


Since X; and X; are parameters of different items in the population, it is reasonable 
to assume that they are statistically independent random variables when i + j. 
Hence, it follows that 


HX =F i=j 
=% or (XP i+j 
Using this result in (4—3) leads to 


Var (X) — X [nX + (? — nyxXy] - XY 
" mM (4-4) 
X -Qy o 


n n 


where o? is the true variance of the population. Note that the variance of the 
sample mean can be made small by making n large. This suggests that large 
sample sizes lead to a better estimate of the population mean, since the expected 
value of the sample mean is always equal to the true population mean, regardless 
of sample size, but the variance of the sample mean decreases as n gets large. 

As noted previously, the result given in (4—4) assumed that N was very large. 
There is an alternative approach to sampling that leads to the same result as as- 
suming a large population. Recall that the basic reason for assuming that the 
population size is very large is to assure that the statistical characteristics of the 
population do not change as we withdraw the members of the sample. For exam- 
ple, suppose we have a population consisting of five 10-2 resistors and five 100- 
£2 resistors. Withdrawing even one resistor will leave the remaining population 
with a significantly different proportion of the two resistor types. However, if the 
population consisted of one million 10-£2 resistors and one million 100-2 resistors 
then withdrawing one resistor, or even a thousand resistors, is not going to alter 
the composition of the remaining population significantly. The same sort of free- 
dom from changing population characteristics can be achieved by replacing an 
item that is withdrawn after it has been examined, tested, and recorded. Since 
every item is drawn from exactly the same population, the effect of having an 
infinite population is achieved. Of course, one may select an item that has already 
been examined, but if the selection is done in a truly random fashion this will 
make no difference to the validity of the conclusions that might be drawn. Sam- 
pling done in this manner is said to be sampling with replacement. 

There may be situations, of course, in which one may not wish to replace a 
sample or may be unable to replace it. For example, if the testing to be done is a 
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life test, or a test that involves destroying the item, replacement is not possible. 
Similarly, in a public opinion poll or TV program survey, one simply does not 
wish to question the same person twice. In such situations, it is still possible to 
calculate the variance of the sample mean even when the population size is quite 
small. The mathematical expression for this, which is simply quoted here without 
proof, is 





A 2 N — 
Var (X) = = (x= J (4-5) 


Note that as N becomes very large, this expression approaches the previous one. 
Note also, that if N = n, the sample variance becomes zero. This must be the 
case because this condition corresponds to every item in the population being 
sampled and, hence, the sample mean must be exactly the same as the population 
mean. It is clear, however, that one would not do this if destructive testing were 
involved! Two examples serve to illustrate the above ideas. The first example 
considers a case in which the population size is infinite or very large. Suppose we 
have a random waveform such as illustrated in Figure 4—1 and we wish to estimate 
the mean value of this waveform, which, we shall assume, has a true mean value 
of 10 and a true variance of 9. 

As indicated in Figure 4—1, the value of this waveform is being sampled at 
equally spaced time instants £j, t5, . . . , t,. In the general situation, these sample 
values are random variables and are denoted by X; = X(t) fori = 1,2,..., 
n. We would like to find how many samples should be taken to estimate the mean 
value of this waveform with a standard deviation that is only one percent of the 
true mean value. If we assume that the waveform lasts forever, so that the popu- 
lation of time samples is infinite, then from (4—4) 

2 


in which the two right-hand terms are the desired variance of the estimate and 
correspond to a standard deviation of one percent of the true mean. Thus, 


9 
aoi ^^ 


This result indicates that the sample size must be quite large in most cases of 
sampling an infinite population, or in sampling with replacement, if it is desired 
to obtain a sample mean with a small variance. 

Of course, estimating the mean value of the random time function with the 
specified variance does not necessarily imply that the estimate is really within one 
percent of true mean. It is possible, however, to determine the probability that the 
estimate of the mean is within one percent (or any amount) of the true mean. In 
order to do this, the probability density function of the estimate must be known. 
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Figure 4-1 A random waveform that is being sampled. 


In the case of a large sample size, the central limit theorem comes to the rescue 
and assures us that since the estimated mean is related to the sum of a large 
number of independent random variables, the sum is very nearly Gaussian regard- 
less of the density function of the individual sample values. Thus, we can say that 


the probability that X is within one percent of X is 


Pr(9.9 < X < 10.1) = F(10.1) — F(9.9) 


10.1 — 10 9.9 — 10 | 
= $ [e - 4 pe = $(1) - &(-1) = 20(1) — 1 


2 X 0.8413 — 1 — 0.6826 


Hence, there is a significant probability (0.3174) that the estimate of the popula- 
tion mean is actually more than one percent away from the true population mean. 

The assumption of a Gaussian probability density function for sample means is 
quite realistic when the sample size 1s large, but may not be very good for small 
sample sizes. A method of dealing with small sample sizes is discussed in a sub- 
sequent section. 

The second example considers a situation in which the population size is not 
large and sampling is done without replacement. In this example, there is a pop- 
ulation of 100 bipolar transistors for which one wishes to estimate the mean value 
of the current gain, 8. If the true population mean is 8 = 120 and the true 
population variance is 0 — 25, how large a sample size is required to obtain a 
sample mean that has a standard deviation that is one percent of the true mean? 
Since the desired variance of the sample mean is 


Var(B) = (0.01 x 120)? = 1.44 


it follows from (4—5) that 
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This may be solved for n to yield n = 14.92, which implies a sample size of 15 
since n must be an integer. This relatively small sample size is a consequence of 
having a small population size. In this case, for example, a sample size of 100 
(that is, sampling every item) would result in a variance of the sample mean of 
exactly zero. 

It is also possible to calculate the probability that the sample mean is within 
one percent of the true population mean, but it is not reasonable in this case to 
assume that the sample mean has a Gaussian density function unless, of course, 
the original 8 random variables are Gaussian. This is because the sample size of 
15 is too small for the central limit theorem to be effective. As a rule of thumb, 
it is often assumed that a sample size of at least 30 is required to make the 
Gaussian assumption. A technique for dealing with smaller sample sizes is consid- 
ered when sampling distributions are discussed. 





Exercise 4—2.1 


An endless production line is turning out solid-state diodes and every 
100th diode is tested for reverse current /_, and forward current /, at 
diode voltages of — 1 and +1 respectively. 


a) If the random variable /_, has a true mean value of 10 9 and 
a variance of 10 '', how many diodes must be tested to obtain 
a sample mean whose standard deviation is five percent of the 
true mean? 


b) If the random variable /, has a true mean value of 0.1 and a 
variance of 0.0025, how many diodes must be tested to obtain 
a sample mean whose standard deviation is one percent of the 
true mean? 


c) If the larger of the two numbers found in (a) and (b) is used for 
both tests, what will the standard deviations of the sample 
mean be for each test? 


Answers: 5 x 10 °, 2500, 0.00079, 4000 


Exercise 4—2.2 


A population of 80 resistors is to be tested without replacement to 
obtain a sample mean whose standard deviation is two percent of the 
true population mean. 
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a) How large must the sample size be if the true population mean 
is 100 2 and the true standard deviation is 5 (2? 


b) How large must the sample size be if the true population mean 
is 100 N and the true standard deviation is 1 (2? 


c) If the sample size is 10, what is the standard deviation of the 
sample mean for the population of part (b)? 


Answers: 1, 6, 0.1 





4—3 Sampling Theory—The Sample Variance 


In the previous section, we discussed estimating the mean value of a population 
of random variables by averaging the values in the sample taken from that popu- 
lation. We also determined the variance of that estimate and indicated how it 
influenced the sample size. However, in addition to the mean value, we may also 
be interested in estimating the variance of the random variables in the population. 
A knowledge of the variance is important because it indicates something about the 
spread of values around the mean. For example, it is not sufficient to test resistors 
and find that the sample mean is very close to the desired resistance value. If the 
standard deviation of the resistance values is very large, then regardless of how 
close the sample mean is, many of the resistors can be quite far from the desired 
value. Hence, it is necessary to control the variance of the population as well as 
its mean. 

There is also another reason for wanting to estimate the variance of the popu- 
lation. You may recall that the population variance is needed in order to determine 
the sample size required to achieve a desired variance of the sample mean. Ini- 
tially, one may not know the population variance and, thus, not have any idea as 
to how large the sample size should be. Estimating the population variance will at 
least provide some information as to how the sample size should be changed to 
achieve the desired results. 

The sample variance is denoted initially by $, the change in notation being 
adopted in order to avoid undue notational complexity in distinguishing among 
the several variances. In terms of the random variables in the sample, X;, . . . , 
X,, the sample variance may be defined as 


E: (4-6) 
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Note that the second summation in this expression is just the sample mean, so the 
entire expression represents the sample mean of the square of the difference be- 
tween the random variables and the sample mean. 

The expected value of S^ can be obtained by expanding the squared term in (4— 
6) and taking the expected value of each term in the expansion. The details are 
tedious, but the method is straightforward and the result is 


lg (4-7) 





E[$?] = 


where o? is the true variance of the population. Note that the expected value of 
the sample variance is not the true variance. Thus, this is a biased estimate of the 
variance rather than an unbiased one. For most applications, one would like to 
have an unbiased estimate of any parameter. Hence, it is desirable to see if an 
unbiased estimate can be achieved readily. From (4-7), it is clear that one need 
only modify the original estimate by the factor n/(n — 1). Therefore, an unbiased 
estimate of the population variance can be achieved by defining the sample vari- 
ance as 





S? 


Il 


n-—1 (4-8) 





Both of the above results have assumed that the population size is very large; 
i.e., N = c, When the population is not large, the expected value of 5? is given 
by 

N m1 , 


M 2 
"TNT X eee 








Note that this is also a biased estimate, but that the bias can be removed by 
defining S? as 
* N= 1 n 
$$ = ——.——S$ 4—10 
N n— il ( 
Note that both of these results reduce to the previous ones as N — ©. 
The variance of the estimates of variance can also be obtained by straightfor- 
ward, but tedious, methods. For example, it can be shown that the variance of 5^ 
is given by 


M 


Var ($2) = S (4-11) 
n 
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where pa is the fourth central moment of the population and is defined by 


jug = E(X — X)'] (4-12) 
The variance of $? follows immediately from (4—7) and (4—8) as 
2 n(p — o) 
Var$? = “Hs T? 
ar $ (S D (4-13) 


Only the large sample size case will be considered to illustrate an application 
of the above results. For this purpose, consider again the random time function 
displayed in Figure 4—1 and for which the sample mean has been discussed. It is 
found in that discussion that a sample size of 900 is required to reduce the stan- 
dard deviation of the sample mean to a value that is one percent of the true mean. 
Now suppose this same sample of size 900 is used to determine the sample vari- 
ance; specifically, we will use it to calculate S? as defined in (4-8). Recall that $? 
is an unbiased estimate of the population variance. The variance of this estimate 
can now be evaluated from (4—13) if we know the fourth central moment. Unfor- 
tunately, the fourth central moment is not easily obtained unless we know the 
probability density function of the random variables. For the purpose of this dis- 
cussion, let us assume that the random waveform under consideration is Gaussian 
and that the random variables that make up the sample are mutually statistically 
independent. From equation (2-27) in Section 2-5, we know that the fourth cen- 
tral moment of a Gaussian random variable is just 3o*. Using this value in (4— 
13), and remembering that for this waveform o^ is 9, leads to 


900(3 x 9? — 9?) 
(900 — 1y 


This value of variance corresponds to a standard deviation of 0.4247, which is 
4.72 percent of the true population variance. One conclusion that can be drawn 
from this example, and which turns out to be fairly true in general, is that it takes 
a larger sample size to achieve a given accuracy in estimating the population 
variance than it does to estimate the population mean. 

It is also possible to determine the probability that the sample variance is within 
any specified region if the probability density function of $^ is known. In the large 
sample size case, this probability density function may be assumed Gaussian as is 
done in the case of the sample mean. In the small sample size case, this is not 
reasonable. In fact, if the original random variables are Gaussian the probability 
density function of $^ is chi-squared for any sample size. Another situation is 
discussed in a subsequent section. 


Var ($^) =" — 0.1804 
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Exercise 4—3.1 


For the random waveform of Fig. 4—1, find the sample size that would 
be required to estimate the true variance of the waveform with: 


a) a standard deviation of one percent of the true variance if an 
unbiased estimator is used 


b) a standard deviation of one percent of the true variance if a 
biased estimator is used. 


Answers: 20,000, 20,002 


Exercise 4—3.2 


Independent samples are taken from a random time function having a 
probability density function of 


f; (X) 


=x 


e x=0 
= 0 x <0 


How many samples are required to estimate the variance of this time 
function with a standard deviation that is five percent of the true vari- 
ance if an unbiased estimator is used. 


Answer: 3133 





4—4 Sampling Distributions and Confidence 
Intervals 


Although the mean and variance of any estimate of a population parameter do 
give useful information about the population, it is not sufficient to answer ques- 
tions about the probability that these estimates are within specified bounds. In 
order to answer these questions, it is necessary to know the probability density 
functions associated with parameter estimates such as the sample mean or the 
sample variance. A great deal of effort has been expended in the study of statistics 
to determine these probability density functions and many such functions are de- 
scribed in the literature. Only two probability density functions are discussed here 
and these are discussed only for sample means. 
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The sample mean is defined in (4—2) as 


fl 


2, X, 


i 
n i=] 


M 


f= 


where n is the sample size and X; are random variables from the population. If 
the X; are Gaussian and independent, with a mean of X and a variance of o, then 
the normalized random variable Z, defined by 


X-X 
alV/n 


is Gaussian with zero mean and unit variance. Thus, when the population is Gaus- 
sian, the sample mean is also Gaussian regardless of the size of the population or 
the size of the sample provided that the true population standard deviation is 
known so that it can be used in (4—14) to normalize the random variable. If the 
population is not Gaussian, the central limit theorem assures us that Z is asymp- 
totically Gaussian as n — ^. Hence, for large n, the sample mean may still be 
assumed to be Gaussian. Also, if the true population variance is not known, the 
o in (4—14) may be replaced by its estimate, S since this estimate should be close 
to the true value for large n. The questions that arise in this case, however, are 
how large does n have to be and what does one do if n is not this large? 

A rule of thumb that is often used is that the Gaussian assumption is reasonable 
if n = 30. If the sample size is less than 30, and if the population random vari- 
ables are not Gaussian, very little can be said in general and each situation must 
be examined in the light of its own particular characteristics. However, if the 
population random variables are Gaussian and the true population variance is not 
known, the normalized sample mean is no longer Gaussian because the S that is 
used to replace o in (4—14) is also a random variable. It is possible to specify the 
probability density function of the normalized sample mean, however, and this 
topic is considered next. 

When n < 30, define the normalized sample mean as 





Z (4-14) 





"uc X-X (4-15 
Si'n | SiWn-1 
The random variable T is said to have a Student's t distribution! with n — 1 


degrees of freedom. 


"The Student's t distribution was discovered by William Gosset who published it using the pen name 
'Student' because his employer, the Guinness Brewery, had a strict rule against their employees 
publishing their discoveries under their own names. 
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In order to define the Student's t probability density function, let v — n — 1 
be denoted as the degrees of freedom. The density function then is defined by 


r[?—1 


2 E Xd 
f(t) = ———— ( 4 “) 2 (4-16) 
vrar (2) 





where /(:) is the gamma function, some of whose essential properties are dis- 
cussed below. This density function, for v — 1, is displayed in Figure 4—2, along 
with the normalized Gaussian density function for purposes of comparison. It may 
be noted that the Student's t density function has heavier tails than does the Gaus- 
sian density function. However, when n = 30 the two density functions are almost 
indistinguishable. 

In order to evaluate the Student's t density function it is necessary to evaluate 
the gamma function. Fortunately, this can be done readily in this case by noting 
a few special relations. First, there is a recursion relation of the form 


I(k + 1) = kI(k) any k (4-17) 
= k! integer k 
Next, some special values of the gamma function are 
rd) = IQ)21, M02) = Vr 


Note that in evaluating the Student’s t density function all arguments of the 
gamma function are either integers or one-half plus an integer. As an illustration 
of the application of (4—17), let k = 3.5. 







Gaussian 


Student's t 
v=] 


-3 - =] 0 l 2 3! 


Figure 4-2 Comparison of Student's t and Gaussian probability density functions. 
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Table 4-1. Confidence Interval Width for a Gaussian Density 


Function. 
qro k 
90 1.64 
95 1.96 
99 2.58 
99.9 3.29 


Thus 


I(3.5) = 2.5 - (2.5) = 2.5 - 1.5 - I(1.5) = 2.5 < 1.5 - .5 + FS) 
= 2.5- 1.5-5- vyr = 3.323 


The concept of a confidence interval is one that arises very often in the study 
of statistics. Although the confidence interval is most appropriately considered in 
connection with estimation theory, it is convenient to discuss it here as an appli- 
cation of the probability density function of the sample mean. The sample mean, 
as we defined it, is really a point estimate in the sense that it assigns a single 
value to the estimate. The alternative to a point estimate is an interval estimate in 
which the parameter being estimated is declared to lie within a certain interval 
with a certain probability. This interval is the confidence interval. 

More specifically, a q-percent confidence interval is the interval within which 
the estimate will lie with a probability of g/100. The limits of this interval are the 
confidence limits and the value of q is said to be the confidence level. 

When considering the sample mean, the q-percent confidence interval is defined 
as 


Y-—zXzsX4— (4-18) 


= 
= 


where & is a constant that depends upon g and the probability density function of 
X. Specifically, 


X - ker 


q — 100 TW fedr (4-19) 


For the Gaussian density function, the values of k can be tabulated readily as a 
function of the confidence level. A very limited table of this sort is given in Table 
4—]. 

As an illustration of the use of this table, consider once again the random wave- 
form of Figure 4—1 for which the true population mean is 10, the true population 
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Table 4-2. Probability Distribution for Student's t Function 
(v — 8). 


t F;(t) 
0.262 0.60 
0.706 0.75 
1.397 0.90 
1.860 0.95 
2.306 0.975 
2.896 0.99 
3.355 0.995 


variance is 9, and 900 samples are taken. The width of a 95 percent confidence 
interval is just 





me 1.964/9 AN 1.96\/9 
ia = = —— MÀ 
wv 900 wv 900 


Thus, there is a probability of 0.95 that the sample mean will lie in the interval 
between 9.804 and 10.196. 

It is worth noting that large confidence levels correspond to wide confidence 
intervals. Hence, there is a small probability that an estimate will lie within a very 
narrow confidence interval, but a large probability that it will lie within a broad 
confidence interval. It follows, therefore, that a 99 percent confidence level rep- 
resents a poorer estimate than does, say, a 90 percent confidence level when the 
same sample sizes are being compared. 

The same information regarding confidence intervals can be obtained from the 
probability distribution function. Note that the integral in (4—19) can be replaced 
by the difference of two distribution functions. Hence, this relation could have 
been written as 


q = 100[F(X + ko) — Fz (X — ko)] (4-20) 


It is also possible to tabulate k-values for the Student's t distribution, but a 
different set of values is required for each value of v, the degrees of freedom. 
However, it is customary to present this information in terms of the probability 
distribution function. A modest table of these values is given in Appendix F, 
while a much smaller table for the particular case of eight degrees of freedom is 
given in Table 4—2 to assist in the discussion that follows. 

The application of this table to several aspects of hypothesis testing is discussed 
in the next section. 
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Exercise 4—4.1 
Calculate the probability density function for the Student's t density for 
t — 1 and for 

a) 5 degrees of freedom, 


b) 10 degrees of freedom. 


Answers: 0.2197, 0.2304 


Exercise 4—4.2 


A very large population of resistor values has a true mean of 100 N 
and a standard deviation of 5 2. The resistance values may be as- 
sumed to be Gaussian random variables. Find the confidence limits 
on the sample mean for a confidence level of 99 percent if it is com- 
puted from 


a) a sample size of 100, 


b) asample size of 9. 


Answers: 94.41 to 105.59, 98.71 to 101.29 





4—5 Hypothesis Testing 


One of the important applications of statistics is that of making decisions about 
the parameters of a population. In the preceding sections we have seen how to 
estimate the mean value or the variance of a population and how to assign confi- 
dence intervals to these estimates for any specified level of confidence. The next 
step is to make some hypothesis about the population and then determine if the 
observed sample confirms or rejects this hypothesis. For example, a manufacturer 
may claim that the light bulbs he produces have an average lifetime of 1000 hours. 
The hypothesis is then made that the mean value of this population (i.e., the 
lifetimes of all light bulbs produced) is 1000 hours. Since it is not possible to run 
life tests on all the light bulbs produced, a small fraction is tested and the sample 
mean determined. The question then is: does the result of this test verify the 
hypothesis? To take an extreme example, suppose only two light bulbs are tested 
and the sample mean is found to be 900 hours. Does this prove that the hypothesis 
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about the average lifetime of the population of all light bulbs is false? Probably 
not, because the sample size is too small to be able to make a reasonable decision. 
On the other hand, suppose the sample mean of these two light bulbs is 1000 
hours. Does this prove that the hypothesis is correct? Again, the answer is prob- 
ably not. The question then becomes: how does one decide to accept or reject a 
given hypothesis when the sample size and the confidence level are specified? We 
now have the background necessary to answer that question and will do so in 
several specific cases by means of examples. 

One way of classifying hypothesis tests is based on whether they are one-sided 
or two-sided. In a one-sided test, one is concerned with what happens on one side 
of the desired value of the parameter. For example, in the light bulb situation 
above, we are concerned only if the average lifetime is less than 1000 hours and 
would be happy to have the average lifetime greater than 1000 hours by any 
amount. There are many other situations of a comparable nature. On the other 
hand, in a two-sided test we are concerned about deviations in either direction 
from the hypothesized value. For example, if we have a supply of 100-2 resistors 
that we are testing, it is equally serious if the resistance is either too high or 
too low. 

To consider the one-sided test first, imagine that a capacitor manufacturer 
claims that his capacitors have a mean value of breakdown voltage of 300 volts 
or greater. We test the breakdown voltage of a sample of 100 capacitors and find 
that the sample mean is 290 volts and the unbiased sample standard deviation, S, 
is 40 volts. Is the manufacturer's claim valid if a 99 percent confidence level is 
used? Note that this is a one-sided test since we don't care how much greater than 
300 volts the mean value of breakdown voltage might be. 

We start by making the hypothesis that the true mean value of the population 
is 300 volts and then check to see if this hypothesis is consistent with the observed 
data. Since the sample size is greater than 30, the Gaussian assumption may be 
employed here, with o set equal to $. Thus, the value of the normalized random 
variable, Z — z, is 





For a one-sided confidence level of 99 percent the critical value of z is found from 
that value above which the area of Fz(z) is 0.99. That is, 


| fz) dz = 1 — (z) = 0.99 


from which z. = — 2.33. Since the observed value of z is less than z,, we would 
reject the hypothesis; that is, we would say that the claim that the mean break- 
down voltage is 300 volts or greater is not valid. 

An often confusing point in connection with hypothesis testing is the real mean- 
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ing of the decision made. In the example above, the decision means that there is 
a probability of 0.99 that the observed sample did mot come from a population 
having a true mean of 300 volts. This seems clear enough; the confusing point, 
however, is that had we chosen a confidence level of 99.5 percent we would have 
accepted the hypothesis because the critical value of z for this level of confidence 
is — 2.575 and the observed z-value is now greater than z,. Thus, choosing a high 
confidence level makes it more likely that any given sample will result in accept- 
ing the hypothesis. This seems contrary to logic, but the reason is clear; a high 
confidence results in a wider confidence interval because a greater fraction of the 
probability density function must be contained in it. Conversely, selecting a small 
confidence level makes it less likely that any given sample will result in accepting 
the hypothesis and, thus, is a more severe requirement. Because the use of the 
term confidence level does seem to be contradictory, some statisticians prefer to 
use the /evel of significance, which is just the confidence level subtracted from 
100 percent. Thus, a confidence level of 99 percent corresponds to a 1 percent 
level of significance while a confidence level of 99.5 percent is only a 0.5 percent 
level of significance. A larger level of significance corresponds to a more severe 
test of the hypothesis. 

The example concerning the capacitor breakdown voltage is now reconsidered 
when the sample size is small. Suppose we test only 9 capacitors and find that the 
mean value of breakdown voltage is 290 V and the unbiased sample standard 
deviation is 40 V. Note that these are the same values that were obtained with a 
large sample size. However, since the sample size is less than 30 we will use the 
T random variable, which for this case is 


For the Student's t density function with v = n — 1 = 8 degrees of freedom, 
the critical value of 1 for a confidence level of 99 percent is, from Table 4-2, t. 
= — 2.896. Since the observed value of t is now greater than t. we would accept 
the hypothesis that the true mean breakdown voltage is 300 V or greater. 

Note that the use of a small sample size tends to increase the value of ¢ and, 
hence, makes it more likely to exceed the critical value. Furthermore, the small 
sample size leads to the use of the Student's t distribution, which has heavier tails 
than the Gaussian distribution and, thus, leads to a smaller value of t.. Both of 
these factors together make small sample size tests less reliable than large sample 
size tests. 

The next example considers a two-sided hypothesis test. Suppose that a manu- 
facturer of Zener diodes claims that a certain type has a mean breakdown voltage 
of 10 V. Since a Zener diode is used as a voltage regulator, deviations from the 
desired value of breakdown voltage in either direction are equally undesirable. 
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Hence, we hypothesize that the true mean value of the population is 10 V and 
then seek a test that either accepts or rejects this hypothesis and utilizes the fact 
that deviations on either side of 10 are of concern. 

Considering a large sample size test first, suppose we test 100 Zener diodes and 
find that the sample mean is 10.3 V and the unbiased sample standard deviation 
is 1.2 V. Is the claim valid if a 95 percent confidence level is used? Since the 
sample size is greater than 30, we can use the Gaussian random variable, Z, which 
for this sample is 


_ 1-30. ». 
L2/V100 ^ 


For a 95 percent confidence level, the critical values of the Gaussian random 
variable are, from Table 4—1, + 1.96. Thus, in order to accept the hypothesis it 
is necessary for z to lie in the region — 1.96 = z = 1.96. Since z = 2.5 does not 
lie in this interval, the hypothesis is rejected; that is, the manufacturer's claim is 
not valid since the observed sample could not have come from a population having 
a mean value of 10 with a probability of 0.95. 

This same test is now repeated with a small sample size. Suppose that 9 Zener 
diodes are tested and it is found that the mean value of their breakdown voltages 
is again 10.3 V and the unbiased sample standard deviation is 1.2 V. The Stu- 
dent's t random variable now has a value of 


ve B= X . 10.3 = 10 
S/N/n 1.2/ /9 


The critical values of t can be obtained from Table 4—2 since there are once 
again 8 degrees of freedom. Since Table 4—2 lists the distribution function for the 
Student's t random variable and we are interested in finding the interval around 
zero that contains 95 percent of the area, there will be 2.5 percent of the area 
above t, and 2.5 percent below t.. Thus, the value that we need from the table is 
that corresponding to 0.975. This is seen easly by noting that 


= 0.75 


Pr[-t. < T = t] = Fr(t.) — Fr(—-t,.) = 2Fr(t.) — 1 = 0.95 
Therefore 


Fr(t.) = zi = 0.975 
From Table 4—2 the required value is t, = 2.306. In order to accept the hypoth- 
esis, it is necessary that the observed value of ¢ lie in the range —2.306 < t = 
2.306. Since t = 0.75 does lie in this range, the hypothesis is accepted and the 
manufacturer's claim is considered to be valid. Again we see that a small sample 
test is not as severe as a large sample test. 
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Exercise 4—5.1 


A certain type of bipolar transistor is claimed to have a mean value of 
current gain of 8 = 200. A sample of these transistors is tested and 
the sample mean value of current gain is found to be 190 and the 
unbiased sample standard deviation is 40. If a 95 percent confidence 
level is employed, is this claim valid if 


a) the sample size is 100? 
b) the sample size is 20? 


Answers: z = —2.5,Z, = — 1.645, no; 
t= —1.118, t, = — 1.729, yes 


Exercise 4—5.2 


A certain type of bipolar transistor is claimed to have mean collector 
current of 4 mA. A sample of these transistors is tested and the sam- 
ple mean value of collector current is found to be 4.2 mA and the 
unbiased sample standard deviation is 0.8 mA. If a 95 percent confi- 
dence level is employed, is this claim valid if 


a) the sample size is 100? 
b) the sample size is 20? 


Answers: z = 2.5, Z = + 1.96, no 
t = 1.118, t, = + 2.09, yes 





4—6 Curve Fitting and Linear Regression 


The topic considered in this section is considerably different from those in pre- 
vious sections, but it does represent an important application of statistics in engi- 
neering problems. Frequently, statistical data reveals a relationship between two 
or more variables and it is desired to express this relationship in mathematical 
form by determining an equation that connects the variables. For example, one 
might collect data on the lifetime of light bulbs as a function of the applied volt- 
age. Such data might be presented in the form of a scatter diagram, such as 
shown in Figure 4—3, in which each observed lifetime and the corresponding 
operating voltage are plotted as a point on a two-dimensional plane. 
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Lifetime (hrs) 
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Figure 4-3 Scatter diagram of light bulb lifetimes and applied voltage. 


Also shown in Figure 4—3 is a solid curve that represents, in some sense, the 
best fit between the data points and a mathematical expression that relates the two 
variables. The objective of this section is to show one way of obtaining such a 
mathematical relationship. 

For purposes of discussion, it is convenient to consider the two variables as x 
and y. Since the data consists of specific numerical values, in keeping with our 
previously adopted notation, this data is represented by lower case letters. Thus, 
for a sample size of n we would have values of one variable denoted as xi, x2, 

. , X, and corresponding values of the other variable as yi, yo, . . ., Ya. For 
example, for the data displayed in Figure 4—3 each x-value might be an applied 
voltage and each y-value the corresponding lifetime. 

The general problem of finding a mathematical relationship to represent the data 
is called curve fitting. The resulting curve is called a regression curve and the 
mathematical equation is the regression equation. In order to find a ‘‘best’’ regres- 
sion equation it is first necessary to establish a criterion that will be used to define 
what is meant by ''best." Consider the scatter diagram and regression curve 
shown in Figure 4—4. 





Figure 4—4 Error between the regression curve and the scatter diagram. 
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In this figure, the difference between the regression curve and the corresponding 
value of y at any x is designated as d; i = 1, 2, . . . , n. The criterion of 
goodness of fit that is employed here is that 


d? --d/--...- dj = aminimum (4-21) 


Such a criterion leads to a least-squares regression curve and is the criterion that 
is most often employed. Note that the least-squares criterion weights errors on 
either side of the regression curve equally and also weights large errors more than 
small errors. 

Having decided upon a criterion to use, the next step is to select the type of the 
equation that is to be fitted to the data. This choice is based largely on the nature 
of the data, but most often a polynomial of the form 


y-sactbxtcdm...- kd 


is used. Although it is possible to fit an (n — 1)-degree polynomial to n data 
points, one would never want to do this because it would provide no smoothing 
of the data. That is, the resulting polynomial would go through each data point 
and the resulting least-squares error would be zero. Since the data is random, one 
is more interested in a regression curve that approximates the mean value of the 
data. Thus, in most cases, a first or second degree polynomial is employed. Our 
discussion in this section is limited to using a first degree polynomial in order to 
perserve simplicity while conveying the essential aspects of the method. This tech- 
nique is referred to as linear regression. 
The linear regression equation becomes 


y=a + bx (4-22) 
in which it is necessary to determine the values of a and b that satisfy (4—21). 
These are determined by writing 


n 
b ly; — (a + bx)] = a minimum 
i=] 


In order to minimize this expression, one would differentiate partially with respect 
to a and b and set the derivatives equal to zero. This leads to two equations that 
may be solved simultaneously for the values of a and b. The equations are 


and 


= 
= 
= 





164 CHAPTER 4 ELEMENTS OF STATISTICS 


Table 4-3. Data for Breakdown Voltage vs Temperature. 

i l 2 3 4 3 6 7 8 9 10 
"C, x, 10 20 30 40 50 60 70 80 90 100 
Vs, Yi 420 410 360 360 340 290 300 270 210 — 200 


The resulting values of a and b are 

b = —— (4-23) 
and 
= = (4-24) 


Although these are fairly complicated expressions, they can be evaluated readily 
by computer or programmable calculator. 

An example serves to illustrate the technique. A manufacturer of capacitors 
wishes to determine the relationship between the breakdown voltage and the am- 
bient temperature in which the capacitor operates. He tests 10 capacitors at differ- 
ent temperatures and obtains the data displayed in Table 4—3. From equations (4— 
23) and 4-24) the values of a and b become a = 451.33 and b = — 2.406. 
Thus, the desired mathematical relationship is 


(Breakdown Voltage) = 451.33 — 2.406 (Temperature) 


The corresponding scatter diagram and linear regression curve are shown in Fig- 
ure 4-5. 

Similar techniques can be used to fit higher degree polynomials to experimental 
data. Obviously, the difficulty in determining the best values for the polynomial 
coefficients increases as the degree of the polynomial increases. However, there 
are very effective matrix formulations of the problem that lend themselves readily 
to computational methods. 
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Figure 4—5 Linear regression curve for capacitor breakdown voltage vs ambient 
temperature. 





Exercise 4—6.1 


Four light bulbs are tested to establish a relationship between lifetime 
and operating voltage. The resulting data are shown in the following 
table: 


i l 2 3 4 
V xj 105 | 10 115 120 
Hrs. y; 1200 1000 920 750 


Find the coefficients of the linear regression curve and plot it and the 
scatter diagram. 


Answers: 4185, —28.6 


Exercise 4—6.2 


Assume that the linear regression curve determined in Exercise 4—6.1 
holds for all values of voltage. Find the expected lifetime of a light bulb 
operating at a voltage of 


a) 95 volts 
b) 125 volts 
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C) 117 volts. 
Answers: 838.8, 610, 1468 
PROBLEMS 
4—2.] A calculator with a random number generator produces the following se- 


4-2.2 


4-2.3 


quence of random numbers: 0.276, 0.123, 0.072, 0.324, 0.815, 0.312, 
0.432, 0.283, 0.717, 


a) Find the sample mean. 


b) If the calculator produces three digit random numbers that are uni- 
formly distributed between 0.000 and 0.999, find the variance of the 
sample mean. 


c) How large should the sample size be in order to obtain a sample 
mean whose standard deviation is no greater than 0.01? 


A political poll is assessing the relative strengths of two presidential can- 
didates. A value of +1 is assigned to every person who states a prefer- 
ence for candidate A and a value of — 1 is assigned to anyone who indi- 
cates a preference for candidate B. 


a) Find the sample mean if 60 percent of those polled indicate a pref- 
erence for candidate A. 


b) Wirite an expression for the sample mean as a function of the sample 
size and the percentage of those polled that are in favor of candi- 
date A. 


c) Find the sample size necessary to estimate the percentage of persons 
in favor of candidate A with a standard deviation no greater than 0.1 
percent. 


In a class of 50 students, the result of a particular examination is a true 
mean of 70 and a true variance of 12. It is desired to estimate the mean 
by sampling, without replacement, a subset of the scores. 


a) Find the standard deviation of the sample mean if only 10 scores are 
used. 


4-2.4 


4-2.5 


4-3.1 


4-3.2 


4-3.3 
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b) How large should the sample size be for the standard deviation of the 
sample mean to be one percentage point (out of 100)? 


c) How large should the sample size be for the standard deviation of the 
sample mean to be one percent of the true mean? 


The HYGAYN Transistor Company produces a line of bipolar transistors 
that have an average current gain of 120 with a standard deviation of 10. 
Another company, ACE Electronics, produces a similar line of transistors 
with the same average current gain but with a standard deviation of 5. Ed 
Engineer purchases 20 transistors from each company and mixes them 
together. 


a) If Ed selects a random sample of 5 transistors with replacement, find 
the variance of the sample mean. 


b) If Ed selects a random sample of 5 transistors without replacement, 
find the variance of the sample mean. 


c) How large a sample size should Ed use, without replacement, in or- 
der to obtain a standard deviation of the sample mean of 2? 


For the transistors of Problem 4—2.4, assume that the current gains are 
independent Gaussian random variables. 


a) If Ed selects a random sample of 10 transistors with replacement, 
find the probability that the sample mean is within 2 percent of the 
true mean. 


b) Repeat part (a) if the sampling is without replacement. 


a) For the random numbers given in Problem 4-2.1, find the sample 
variance if an unbiased estimator is used. 


b) Find the variance of this estimate of the population variance. 


A zero-mean Gaussian random time function is sampled so as to obtain 
independent sample values. How many sample values are required to ob- 
tain an unbiased estimate of the variance of the time function with a 
standard deviation that is two percent of the true variance? 


It is desired to estimate the variance of a random phase angle that is 
uniformly distributed over a range of 27. Find the number of independent 
samples that are required to estimate this variance with a standard devia- 
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4—4.1 


4-4.2 


4-4.3 


4-5.2 


4-5.3 
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tion that is five percent of the true variance if an unbiased estimate is 
used. 


a) Calculate the value of the Student's t probability density function for 
t — 2 and for 6 degrees of freedom. 


b) Repeat (a) for 12 degrees of freedom. 


A very large population of bipolar transistors has a current gain with a 
mean value of 120 and a standard deviation of 10. The values of current 
gain may be assumed to be independent Gaussian random variables. 


a) Find the confidence limits for a confidence level of 90 percent on the 
sample mean if it is computed from a sample size of 150. 


b) Repeat part (a) if the sample size is 21. 


Repeat Problem 4—4.2 if a one-sided confidence interval is considered. 
That is, find the value of current gain above which 90 percent of the 
sample means would lie. 


The resistance of coils manufactured by a certain company is claimed to 
have a mean value of resistance of 100 (2. A sample of 9 coils is taken 
and it is found that the sample mean is 115 Q and the sample standard 
deviation is 20 £2. 


a) Isthe claim justified if a 95 percent confidence level is used? 


b) Is the claim justified if a 90 percent confidence level is used? 


Repeat Problem 4—5.1 if the sample size is 50 coils, the sample mean is 
still 115 Q, and the sample standard deviation is 10 2. 


A manufacturer of traveling wave tubes claims the mean lifetime is at 
least 4 years. Twenty of these tubes are installed in a communication 
satellite and a record kept of their performance. It is found that the mean 
lifetime of this sample is 3.7 years and the standard deviation of the 
sample is 1 year. 


a) For what confidence level would the company's claim be valid? 


b) What must the mean lifetime of the tubes have been in order for the 
claim to be valid at a confidence level of 90 percent? 
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4—5.4 A manufacturer of capacitors claims the breakdown voltage has a mean 
value of at least 100 V. A test of nine capacitors yielded breakdown 
voltages of 97, 104, 95, 98, 106, 92, 110, 103, and 93 V. 


a) Find the sample mean. 
b) Find the sample variance using an unbiased estimate. 


c) Is the manufacturer's claim valid if a confidence level of 95 percent 
is employed? 


4—6.1 Data is taken for a random variable Y as a function of another variable X. 
The x-values are 1, 3, 4, 6, 8, 9, 11, 14 and the corresponding y-values 
iue 1.2. 4.4.5.7. 8.9. 


a) Plot the scatter diagram for this data. 


b) Find the linear regression curve that best fits this data. 


4—6.2 A test is made of the breakdown voltage of capacitors as a function of 
the capacitance. For capacitance values of 0.0001, 0.001, 0.01, 0.1, 1, 
10 microfarads the corresponding breakdown voltages are 310, 290, 285, 
270, 260, and 225 V. 


a) Plot the scatter diagram for this data on a semi-log coordinate 
system. 


b) Find the linear regression curve that best fits this data on a semi-log 
coordinate system. 
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Random 
Processes 


5—1 Introduction 


It was noted in Chapter 2 that a random process is a collection of time functions 
and an associated probability description. The probability description may consist 
of the marginal and joint probability density functions of all random variables that 
are point functions of the process at specified time instants. This type of probabil- 
ity description is the only one that will be considered here. 

The entire collection of time functions is an ensemble and will be designated as 
{x(t}, where any particular member of the ensemble, x(t), is a sample function of 
the ensemble. In general, only one sample function of a random process can ever 
be observed; the other sample functions represent all of the other possible realiza- 
tions that might have occurred but did not. An arbitrary sample function is de- 
noted X(t). The values of X(r) at any time ¢, define a random variable denoted as 
X(t,) or simply X,. 

The extension of the concepts of random variables to those of random processes 
is quite simple as far as the mechanics are concerned; in fact, all of the essential 
ideas have already been considered. A more difficult step, however, is the con- 
ceptual one of relating the mathematical representations for random variables to 
the physical properties of the process. Hence, the purpose of this chapter is to 
help clarify this relationship by means of a number of illustrative examples. 

Many different classes of random processes arise in engineering problems. 
Since methods of representing these processes most efficiently do depend upon 
the nature of the process under consideration, it is necessary to classify random 
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processes in a manner that assists in determining an appropriate type of represen- 
tation. Furthermore, it is important to develop a terminology that enables us to 
specify the class of process under consideration in a concise, but complete, man- 
ner so that there is no uncertainty as to which process is being discussed. 

Therefore, one of the first steps in discussing random processes is that of de- 
veloping a terminology that can be used as a ''short-cut'' in the description of the 
characteristics of any given process. A convenient way of doing this is to use a 
set of descriptors, arranged in pairs, and to select one name from each pair to 
describe the process. Those pairs of descriptors that are appropriate in the present 
discussion are: 


Continuous; discrete 
Deterministic; nondeterministic 
Stationary; nonstationary 
Ergodic; nonergodic 


BU N = 





Exercise 5—1.1 


a) If it is assumed that any random process can be described by 
picking one descriptor from each pair of descriptors shown 
above, how many classes of random processes can be de- 
scribed? 


b) It is also possible to consider mixed processes in which two or 
more random processes of the type described in (a) above are 
combined to form a single random process. If two random pro- 
cesses of the type described in (a) are combined, what is the 
total number of classes of random processes that can be de- 
scribed now by the above list of descriptors.? 


Answers: 16,256 


Exercise 5—1.2 


a) A time function is generated by flipping two coins once every 
second. A value of +1 is assigned to each head and a value 
of —1 is assigned to each tail. The time function has a constant 
value equal to that obtained from the sum of the two coins for 
one second and then changes to the new value determined by 
the outcome on the next flip of the coins. Sketch a typical sam- 
ple function of the random process defined in this way. Let the 
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sample function be eight seconds long and let it exhibit all pos- 
sible states with the correct probabilities. 


b) How many possible sample functions, each eight seconds long, 
does the entire ensemble of sample functions for this random 
process have? 


Answer: 6561 





5—2 Continuous and Discrete Random 
Processes 


These terms normally apply to the possible values of the random variables. A 
continuous random process is one in which random variables such as X(t,), X(t), 
and so on, can assume any value within a specified range of possible values. This 
range may be finite, infinite, or semi-infinite. Such things as thermal agitation 
noise in conductors, shot noise in electron tubes or transistors, and wind velocity 
are examples of continuous random processes, A sketch of a typical sample func- 
tion and the corresponding probability density function is shown in Figure 5-1. 
In this example, the range of possible values is semi-infinite. 

A more precise definition for continuous random processes would be that the 
probability distribution function is continuous. This would also imply that the 
density function has no 6 functions in it. 

A discrete random process is one in which the random variables can assume 
only certain isolated values (possibly infinite in number) and no other values. For 
example, a voltage that is either 0 or 100 because of random opening and closing 
of a switch would be a sample function from a discrete random process. This is 


f(x) 





(a) (b) 


Figure 5-1 A continuous random process: (a) typical sample function and (b) 
probability density function. 
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0 100 
(a) (b) 


Figure 5-2 A discrete random process: (a) typical sample function and (b) prob- 
ability density function. 


illustrated in Figure 5—2. Note that the probability density function contains only 
6 functions. 

It is also possible to have mixed processes, which have both continuous and 
discrete components. For example, the current flowing in an ideal rectifier may be 
zero for one-half the time, as shown in Figure 5—3. The corresponding probability 
density has both a continuous part and a 6 function. 

Some other examples of random processes will serve to further illustrate the 
concept of continuous and discrete random processes. Thermal noise in an elec- 
tronic circuit is a typical example of a continuous random process since its ampli- 
tude can take on any positive or negative value. The probability density function 
of thermal noise is a continuous function from minus infinity to plus infinity. 
Quantizing error associated with analog-to-digital conversion, as discussed in Sec- 
tion 2-7, is another example of a continuous random process since this error may 
have any value within a finite range of values determined by the size of the incre- 
ment between quantization levels. The probability density function for the quan- 
tizing error is usually assumed to be uniformly distributed over the range of pos- 
sible errors. This case represents a minor departure from the strict mathematical 
definition for a continuous probability density function since the uniform density 
function is not continuous at the end points. Nevertheless, since the density func- 





(a) (b) 


Figure 5-3 A mixed random process: (a) typical sample function and (b) probabil- 
ity density function. 
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tion does not contain any 6 functions, we consider the random process to be 
continuous for purposes of our classification. 

On the other hand, if one represents the number of telephone calls in progress 
in a telephone system as a random process, the resulting process is discrete since 
the number of calls must be an integer. The probability density function for this 
process contains only a large number of 6 functions. Another example of a dis- 
crete random process is the result of quantizing a sample function from a contin- 
uous random process into another random process that can have only a finite 
number of possible values. For example, an 8-bit analog-to-digital converter takes 
an input signal that may have a continuous probability density function and con- 
verts it into one that has a discrete probability density function with 256 8 func- 
tions. 

Finally we consider some mixed processes that have both a continuous compo- 
nent and a discrete component. One such example is the rectified time function as 
noted above. Another example might be a system containing a limiter such that 
when the output magnitude is less than the limiting value, it has the same value 
as the input. However, the output magnitude can never exceed the limiting value 
regardless of how large the input becomes. Thus, a sample function from a con- 
tinuous random process on the input will produce a sample function from a mixed 
random process on the output and the probability density function of the output 
will have both a continuous part and a pair of 6 functions. 

In all of the cases just mentioned, the sample functions are continuous in time; 
that is, a random variable may be defined for any time. Situations in which the 
random variables exist for particular time instants only (referred to as point pro- 
cesses Or time series) are not discussed in this chapter. 





Exercise 5-2.1 
Gaussian noise having zero mean and a variance of 0.1 V? is added 
to a random binary signal having values of +1 V. 

a) Classify the sum signal as continuous, discrete, or mixed. 


b) Repeat the classification of the sum signal after it has passed 
through a limiter that limits at + 1.1 V. 

c) Repeat the classification of the sum signal after it has passed 
through an ideal hard limiter whose input-output characteris- 
tic is 

Vou — sgn (Vin) 


Answers: Mixed, continuous, discrete 
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Exercise 5-2.2 


A random time function has a mean value of 10 and an amplitude that 
is Rayleigh distributed. This random time function is multiplied by a 
sinusoid having a maximum value of 10 and a random phase that is 
uniformly distributed between 0 and 27. 


a) Classify the product as continuous, discrete, or mixed. 


b) Classify the product after it has been passed through an ideal 
half-wave rectifier. 


c) Suppose the sinusoid is passed through an ideal half-wave rec- 
tifier before it multiplies the Rayleigh distributed time function. 
Classify the product. 


Answers: Continuous, mixed, mixed 





5—3 Deterministic and Nondeterministic 
Random Processes 


In most of the discussion so far, it has been implied that each sample function is 
a random function of time and, as such, its future values cannot be exactly pre- 
dicted from the observed past values. Such a random process is said to be non- 
deterministic. Almost all natural random processes are nondeterministic, because 
the basic mechanism that generates them is either unobservable or extremely com- 
plex. All the examples presented in Section 5-2 are nondeterministic. 

It is possible, however, to define random processes for which the future values 
of any sample function can be exactly predicted from a knowledge of the past 
values. Such a process is said to be deterministic. As an example, consider a 
random process for which each sample function of the process is of the form 


X(t) = A cos (wt + 0) (5-1) 


where A and w are constants and @ is a random variable with a specified proba- 
bility distribution. That is, for any one sample function, 0 has the same value for 
all t but different values for the other members of the ensemble. In this case, the 
only random variation is over the ensemble—not with respect to time. It is still 
possible to define random variables X(t,), X(t»), and so on, and to determine 
probability density functions for them. 
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As a second example of a deterministic process, consider a periodic random 
process having sample functions of the form 


X( = 2, [A, cos (2znf,t) + B, sin (2«nf,)] (5-2) 
item 
in which the A, and the B, are independent random variables that are fixed for 
any one sample function but are different from sample function to sample func- 
tion. Given the past history of any sample function, one can determine these 
coefficients and predict exactly all future values of X(t). 
It is not necessary that deterministic processes be periodic, although this is 
probably the most common situation that arises in practical applications. For ex- 
ample, a deterministic random process might have sample functions of the form 


X(t) = A exp (— Bt) [—0 (5-3) 


in which A and 5 are random variables that are fixed for any one sample function 
but vary from sample function to sample function. 

Although the concept of deterministic random processes may seem a little arti- 
ficial, it often is convenient to obtain a probability model for signals that are 
known except for one or two parameters. The process described by (5-1), for 
example, may be suitable to represent a radio signal in which the magnitude and 
frequency are known, but the phase is not because the precise distance (within a 
fraction of a wavelength) between transmitter and receiver is not. 





Exercise 5—3.1 


A sample function of the random process defined by Equation (5—1) 
is observed at three different time instants and found to have the fol- 
lowing values: 
X(0) = 0 X(1) = 10 X(2) = 0 
There are no zeros between t = 0 and t = 2. 
a) Find the values of A, œw, and ð. 


b) Find the value of X(2.5). 


Answers: 1.57, 10, - 7.07, —1.57 


Exercise 5-3.2 


A random process has sample functions of the form 
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X(t) — 25 A, f(t — nti) 


a=-—cs 


where the A, are independent random variables that are uniformly dis- 
tributed from O to 10, and 


ft = 1 0 = t < (//2)t, 
0 elsewhere 


a) Is this process deterministic or nondeterministic? Why? 


b) Is this process continuous, discrete, or mixed? Why? 


Answers: Nondeterministic, mixed 








5—4 Stationary and Nonstationary Random 
Processes 


It has been noted that one can define a probability density function for random 
variables of the form X (t;) but so far no mention has been made of the dependence 
of this density function on the value of time ¢,. If all marginal and joint density 
functions of the process do not depend upon the choice of time origin, the process 
is said to be stationary. In this case, all of the mean values and moments dis- 
cussed previously are constants that do not depend upon the absolute value of 
time. 

If any of the probability density functions do change with the choice of time 
origin, the process is nonstationary. In this case, one or more of the mean values 
or moments will also depend on time. Since the analysis of systems responding to 
nonstationary random inputs is more involved than in the stationary case, all fu- 
ture discussions are limited to the stationary case unless it is specifically stated to 
the contrary. 

In a rigorous sense, there are no stationary random processes that actually exist 
physically, since any process must have started at some finite time in the past and 
must presumably stop at some finite time in the future. However, there are many 
physical situations in which the process does not change appreciably during the 
time it is being observed. In these cases the stationary assumption leads to a 
convenient mathematical model, which closely approximates reality. 

Determining whether or not the stationary assumption is reasonable for any 
given situation may not be easy. For nondeterministic processes, it depends upon 
the mechanism of generating the process and upon the time duration over which 
the process is observed. As a rule of thumb, it is customary to assume stationarity 
unless there is some obvious change in the source or unless common sense dictates 








178 CHAPTER 5 RANDOM PROCESSES 


otherwise. For example, the thermal noise generated by the random motion of 
electrons in a resistor might reasonably be considered stationary under normal 
conditions. However, if this resistor were being intermittently heated by a current 
through it, the stationary assumption is obviously false. As another example, it 
might be reasonable to assume that random wind velocity comes from a stationary 
source over a period of one hour, say, but common sense indicates that applying 
this same assumption to a period of one week might be unreasonable. 

Deterministic processes are usually stationary only under certain very special 
conditions. It is customary to assume that these conditions exist, but one must be 
aware that this is a deliberate choice and not necessarily a natural occurrence. For 
example, in the case of the random process defined by (5-1), the reader may 
easily show (by calculating the mean value) that the process may be (and, in fact, 
is) stationary when 0 is uniformly distributed over a range from 0 to 277, but that 
it is definitely not stationary when @ is uniformly distributed over a range from 0 
to m. The random process defined by (5—2) can be shown to be stationary if the 
A, and the B, are independent, zero mean, Gaussian random variables, with coef- 
ficients of the same index having equal variances. Under most other situations, 
however, this random process will be nonstationary. The random process defined 
by (5—3) is nonstationary under all circumstances. 

The requirement that all marginal and joint density functions be independent of 
the choice of time origin is frequently more stringent than is necessary for systems 
analysis. A more relaxed requirement, which is often adequate, is that the mean 
value of any random variable, X(t;), is independent of the choice of r, and that 
the correlation of two random variables, X(t;)X(t;) depends only upon the time 
difference, t; — tı. Processes that satisfy these two conditions are said to be 
stationary in the wide sense. This wide-sense stationarity is adequate to guarantee 
that the mean value, mean-square value, variance, and correlation coefficient of 
any pair of random variables are constants independent of the choice of time 
origin. 

In subsequent discussions of the response of systems to random inputs it will 
be found that the evaluation of this response is made much easier when the proc- 
esses may be assumed either strictly stationary or stationary in the wide sense. 
Since the results are identical for either type of stationarity, it is not necessary to 
distinguish between the two in any future discussion. 





Exercise 5—4.1 


a) For the random process described in Exercise 5—3.2, find the 
mean value of the random variable X(t,/4). 


b) Find the mean value of the random variable X(3t,/4). 
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C) Is the process stationary? Why? 


Answers: No,5,0 


Exercise 5—4.2 


A random process has sample functions of the form 
X(t) = A cos (wt + 0) 
in which A and œw are constants and 6 is a random variable. 
a) Prove that this process is stationary in the wide sense if 0 is 
uniformly distributed between 0 and 27. 


b) Prove that this process cannot be stationary if 0 is not uniformly 
distributed over a range of 27r. 





5—5 Ergodic and Nonergodic Random 
Processes 


Some stationary random processes possess the property that almost every member! 
of the ensemble exhibits the same statistical behavior as the whole ensemble. 
Thus, it is possible to determine this statistical behavior by examining only one 
typical sample function. Such processes are said to be ergodic. 

For ergodic processes, the mean values and moments can be determined by time 
averages as well as by ensemble averages. Thus, for example, the nth moment is 
given by 


R " i [* 
xX" = | x,f(x) dx = lim = | X"(t) dt 5-4 
- nf zum rs (5-4) 
It should be emphasized, however, that this condition cannot exist unless the pro- 
cess is stationary. Thus, ergodic processes are also stationary processes. 

A process that does not possess the property of (5—4) is nonergodic. All non- 


The term **almost every member” implies that a set of sample functions having total probability of 
zero may not exhibit the same behavior as the rest of the ensemble. But having zero probability does 
not mean that such a sample function is impossible. 
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stationary processes are nonergodic, but it is also possible for stationary processes 
to be nonergodic. For example, consider sample functions of the form 


X(t) = Y cos (wt + 0) (5-5) 


where « is a constant, Y is a random variable (with respect to the ensemble), and 
0 is a random variable that is uniformly distributed over 0 to 27, with 0 and Y 
being statistically independent. This process can be shown to be stationary but 
nonergodic, since Y is a constant in any one sample function but is different for 
different sample functions. 

It is generally difficult, if not impossible, to prove that ergodicity is a reason- 
able assumption for any physical process, since only one sample function of the 
process can be observed. Nevertheless, it is customary to assume ergodicity unless 
there are compelling physical reasons for not doing so. 





Exercise 5—5.1 


A random process has sample functions of the form 
X(t) =A 


where A is a zero-mean, Gaussian random variable having a variance 
of 4. 


a) Is this process stationary in the wide sense? 


b) Is the process ergodic? Why? 


Answers: No, yes 


Exercise 5—5.2 
A random process has sample functions of the form 


X= > Af(t— nT — t) 


Hn-—U 


where A and 7 are constants and t, is a random variable that is uni- 
formly distributed between O and T. The function f(t) is defined by 


ft) 2-1 Os<t<T7/ 


and is zero elsewhere. 
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a) Find X and X. 

b) Find < x > and < x* >, where < - > implies a time average. 
c) Can this process be stationary? 

d) Can this process be ergodic? 


Answers: A/2, A?/2, yes, yes 





5—6 Measurement of Process Parameters 


The statistical parameters of a random process are the sets of statistical parameters 
(such as mean, mean-square, and variance) associated with the X(t) random vari- 
ables at various times ż. In the case of a stationary process these parameters are 
the same for all such random variables, and, hence, it is customary to consider 
only one set of parameters. 

A problem of considerable practical importance is that of estimating the process 
parameters from the observations of a single sample function (since one sample 
function of finite length is all that is ever available). Because there is only one 
sample function it is not possible to make an ensemble average in order to obtain 
estimates of the parameters. The only alternative, therefore, is to make a time 
average. If the process is ergodic, this is a reasonable approach because a time 
average (over infinite time) is equivalent to an ensemble average, as indicated by 
(5—4). Of course, in most practical situations, we cannot prove that the process is 
ergodic and it is usually necessary to assume that it is ergodic unless there is some 
clear physical reason why it should not be. Furthermore, it is not possible to take 
a time average over an infinite time interval, and a time average over a finite time 
interval will always be just an approximation to the true value. The following 
discussion is aimed at determining how good this approximation is, and upon what 
aspects of the measurement the goodness of the approximation depends. 

Consider first the problem of estimating the mean value of an ergodic random 
process {x(t)}. This estimate will be designated as X and will be computed from a 
finite time average. Thus, for an arbitrary member of the ensemble, let 


^ 


| T 
X = : [ X(t) dt (5-6) 


It should be noted that although X is a single number in any one experiment, it is 
also a random variable, since a different number would be obtained if a different 
time interval were used or if a different sample function had been observed. Thus, 
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X will not be identically equal to the true mean value X, but if the measurement 
is to be useful it should be close to this value. Just how close it is likely to be is 
discussed below. 

Since X is a random variable, it has a mean value and a variance. If X is to be 
a good estimate of X, then the mean value of X should be equal to X and the 
variance should be small. From (5—6) the mean value of X is 


1 (7 |I 
eli | X(t) «| = H E|X (1)] dt TS 
(x X [mE g 
rh Fa Lx | = 


The interchange of expectation and integration is permissible in this case and 
represents a common type of operation. The conditions where such interchanges 
are possible will be discussed in more detail in Chapter 8. It is clear from (5-7) 
that X has the proper mean value. The evaluation of the variance of X is consid- 
erably more involved and requires a knowledge of autocorrelation functions, a 
topic that is considered in the next chapter. However, the variance of such esti- 
mates is considered for the following discrete time case. It is sufficient to note 
here that the variance turns out to be proportional to 1/7. Thus, a better estimate 
of the mean is found by averaging the sample function over a longer time interval. 
As T approaches infinity, the variance approaches zero and the estimate becomes 
equal with probability one to the true mean, as it must for an ergodic process. 
As a practical matter, the integration required by (5—6) can seldom be carried 
out analytically because X(t) cannot be expressed in an explicit mathematical 
form. The alternative is to perform numerical integration upon samples of X(t) 
observed at equally spaced time instants. Thus, if X, = X(Ar), X; = X(2At), 
Xy = X(NAt), then the estimate of X may be expressed as 


E[X] 





N 
È X (5-8) 


This is the discrete time counterpart of (5—6). 
The estimate X is still a random variable and has an expected value of 
A l N 
EIX] = È 2, X x| - X EIX] 


l 
Ni (5-9) 


Hence, the estimate still has the proper mean value. 
In order to evaluate the variance of X it is assumed that the observed samples 
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are spaced far enough apart in time so that they are statistically independent. This 
assumption is made for convenience at this point; a more general derivation can 
be made after considering the material in Chapter 6. The mean-square value of X 
can be expressed as 


T | N N | N N 
E(X)] = Ela 2, 2, xx - 22,2, XJ (5-10) 
N fzijzi | MM Eja 


where the double summation comes from the product of two summations. Since 
the sample values have been assumed to be statistically independent, it follows 
that 


Thus, 


- l — — 
E[(Xy] = Y [NX* + (N? — NYX)] (5-11) 


This results from the fact that the double summation of (5—10) contains N ? terms 
all together, but only N of these correspond to i = j. Equation (5-11) can be 


written as 
, aT 
X + ( - +) ae 
! N (5-12) 
l 


= xm + XY 


z= 


E[Xy] = 


The variance of X can now be written as 


Var (X) = E((XY] — {EIX]? = i ox + (XY — (Xy 
N (5-13) 
l 


3 


N 


This result says that the variance of the estimate of the mean value is simply 1/N 
times the variance of the process. Thus, the quality of the estimate can be made 
better by averaging a larger number of samples. 

As an illustration of the above result, suppose it is desired to estimate the var- 
iance of a zero-mean Gaussian random process by passing it through a square law 
device and estimating the mean value of the output. Suppose it is also desired to 
find the number of sample values that must be averaged in order to be assured 
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that the standard deviation of the resulting estimate is less than 10 percent of the 
true mean value. 

Let the observed sample function of the zero-mean Gaussian process be Y(r) 
and have a variance of o^. After this sample function is squared, it is designated 
as X(t). Thus, 


X() = Y*() 
From (2-27) it follows that 


X = EY] = oy? 
X^ = E[Y?] 


he 
| 


ll 
tad 
9 
ms 


Hence, 
Oy = x = (XY = 3oy = oy = loy 


It is clear from this, that an estimate of X is also an estimate of os. Furthermore, 
the variance of the estimate of X must be 0. O1(X y= 0.0lay* to meet the require- 
ment of an error of less than 10 percent. From (5-13) 


Var (X) = voy = v (2a) = 0.01o,* 
Thus, N = 200 statistically independent samples are required to achieve the de- 
sired accuracy. 

The preceding not only illustrates the problems in estimating the mean value of 
a random process, but also indicates how the variance of a zero-mean process 
might be estimated. The same general procedures can obviously be extended to 
estimate the variance of a nonzero-mean random process. 

When the process whose variance is to be estimated has an unknown mean 
value, the procedure for estimating the variance becomes a little more involved. 
At first thought, it would seem that the logical thing to do is to find the average 
of the X? and then subtract out the square of the estimated mean as given by 
Equation (5-8). It turns out, however, that the resulting estimate of the variance 
is biased—that is, the mean value of the estimate is not the true variance. This 
result occurs because the true mean is unknown. It is possible, however, to correct 
for this lack of knowledge by defining the estimate of the variance as 

Ll « N a 
"a PORE NN 2 X ts PPS T 
e = a 52. X? + TE 5 &) (5-14) 
It is left as an exercise for the student to show that the mean value of this estimate 
is indeed the true variance. The student should also compare this result with a 
similar result shown in Equation (4—8) of the preceding chapter. 
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Exercise 5—6.1 


Ten independent measurements of a voltage from a Gaussian random 
process have the following values: 


207 198 
202 197 
184 213 
204 191 
206 201 


a) Estimate the mean value of this process. 
b) Find the variance of this estimate of the mean. 


C) Estimate the variance of this process. 


Answer: 4.9, 69.34, 200.3 


Exercise 5—6.2 


show that the estimate of the variance given by Equation (5—14) is an 
unbiased estimate. That is, 


a2 2 
Elx] = ox 





PROBLEMS 


5-1.1 A sample function from a random process is generated by rolling a die 
five times. During the interval from i-1 to i the value of the sample func- 
tion is equal to the outcome of the ith roll of the die. 


a) Sketch the resulting sample function if the outcomes of the five rolls 
are 5. 2, 6, 4, 1. 


b) How many different sample functions does the ensemble of this ran- 
dom process contain? 
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5-1.2 


5-2.1 


5-2.2 
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c) What is the probability that the particular sample function observed 
in part (a) will occur? 


d) What is the probability that the sample function consisting entirely of 
threes will occur? 


The random number generator in a computer generates three digit num- 
bers that are uniformly distributed between 0.000 and 0.999 at a rate of 
one random number per second starting at £ = 0. A sample function from 
a random process is generated by summing the ten most recent random 
numbers and assigning this sum as the value of the sample function during 
each one second time interval. The sample functions are denoted as X(t) 
for t = 0. 

a) Find the mean value of the random variable X (4.5). 

b) Find the mean value of the random variable X (9.5). 


c) Find the mean value of the random variable X (20.5). 


Classify each of the following random processes as continuous, discrete, 
or mixed. 


a) A random process in which the random variable is the number of cars 
per minute passing a given traffic counter. 

b) The thermal noise voltage generated by a resistor. 

c) The random process defined in Problem 5-1.2. 


d) The random process that results when a Gaussian random process is 
passed through an ideal half-wave rectifier. 


e) The random process that results when a Gaussian random process is 
passed through an ideal full-wave rectifier. 


f) A random process having sample functions of the form 
X(t) = A cos (Bt + 0) 


where A is a constant, B is a random variable that is exponentially 
distributed from 0 to ©, and @ is a random variable that is uniformly 
distributed between 0 and 27. 


A Gaussian random process having a mean value of 2 and a variance of 
4 is passed through an ideal half-wave rectifier. 


5-3.1 


5-3.2 


2-4.2 
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a) Let X,(t) represent the random process at the output of the half-wave 
rectifier if the positive portions of the input appear in the output. 
Determine the probability density function of X,(t). 


b) Let X,(1) represent the random process at the output of the half-wave 
rectifier if the negative portions of the input appear in the output. 
Determine the probability density function of X,,(t). 


c) Determine the probability density function of X,(1)X,(1). 


State whether each of the random processes described in Problem 5-2.1 
is deterministic or nondeterministic. 


sample functions from a deterministic random process are described by 
X(t) = At t B t=0 
= 0 r<0 
where A is a Gaussian random variable with zero mean and a variance of 


9 and B is a random variable that is uniformly distributed between 0 and 
6. A and B are statistically independent. 


a) Find the mean value of this process. 
b) Find the variance of this process. 


c) Ifa particular sample function is found to have a value of 10 at t = 
2 and a value of 20 at t = 4, find the value of the sample function 
att = 8. 


State whether each of the random processes described in Problem 5-2.1 
may reasonably be considered to be stationary or nonstationary. If you 
describe a process as nonstationary, state the reason for this claim. 


a) Is the process described in Problem 5—3.2 stationary or nonstation- 
ary? Why? 


b) A random process is described by 
X(t) = A + B cos (wt + 0) 


where A is a random variable that is uniformly distributed between 
—5 and +5, B is a Gaussian random variable with zero mean and a 
variance of 25, « is a constant, and 0 is a random variable that is 
uniformly distributed from — 7/2 to -- 37/2. A, B, and 0 are statis- 
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tically independent. Calculate the mean and variance of this process. 


Is this process stationary in the wide sense? 


5-5.1 State whether each of the processes described in Problem 5-2.1 is ergodic 
or nonergodic. If you claim a process is nonergodic, explain why. 


5-5.2 State whether each of the processes described in Problem 5—4.2 is ergodic 


or nonergodic and give reasons for your decision. 


5-6.1 A stationary random process is sampled at time instants separated by 0.01 
seconds. The resulting sample values are tabulated below. 


i 


CM un & wh & 


x(i) 
0.19 
0.29 
1.44 
0.83 

— 0.01 
— 1.23 
— 1.47 


i 
T 
8 
9 
10 
11 
12 
13 


x(i) 
— 1.24 
— 1.88 
—0.31 

1.18 
1.70 
0.57 
0.95 


i 
14 
15 
16 
17 
18 
19 
20 


a) Estimate the mean value of this process. 


— X) 


1.45 
— 0.82 
0:25 
0.23 
— 0.91 
—0.19 
0.24 


b) If the process has a true variance of 1.0, find the variance of your 
estimate of the mean. 


5-6.2 Estimate the variance of the process in Problem 5-6.1. 


References 
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CHAPTER 6 





Correlation 
Functions 


6—1 Introduction 


The subject of correlation between two random variables was introduced in Sec- 
tion 3—4. Now that the concept of a random process has also been introduced, it 
is possible to relate these two subjects to provide a statistical (rather than a prob- 
abilistic) description of random processes. Although a probabilistic description is 
the most complete one, since it incorporates all the knowledge that is available 
about a random process, there are many engineering situations in which this de- 
gree of completeness is neither needed nor possible. If the major interest in a 
random quantity is in its average power, or the way in which that power is dis- 
tributed with frequency, then the entire probability model is not needed. If the 
probability distributions of the random quantities are not known, use of the prob- 
ability model is not even possible. In either case, a partial statistical description, 
in terms of certain average values, may provide an acceptable substitute for the 
probability description. 

It was noted in Section 3—4 that the correlation between two random variables 
was the expected value of their product. If the two random variables are defined 
as samples of a random process at two different time instants, then this expected 
value depends upon how rapidly the time functions can change. We would expect 
that the random variables would be highly correlated when the two time instants 
are very close together, because the time function cannot change rapidly enough 
to be greatly different. On the other hand, we would expect to find very little 
correlation between the values of the random variables when the two time instants 
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are widely separated, because almost any change can take place. Because the 
correlation does depend upon how rapidly the values of the random variable can 
change with respect to time, we expect that this correlation may also be related to 
the manner in which the energy content of a random process is distributed with 
respect to frequency. This is because a time function must have appreciable en- 
ergy at high frequencies in order to be able to change rapidly with time. This 
aspect of random processes is discussed in more detail in subsequent chapters. 

The previously defined correlation was simply a number since the random var- 
iables were not necessarily defined as being associated with time functions. In the 
following case, however, every pair of random variables can be related by the 
time separation between them, and the correlation will be a function of this sepa- 
ration. Thus, it becomes appropriate to define a correlation function in which the 
argument is the time separation of the two random variables. If the two random 
variables come from the same random process, this function will be known as the 
autocorrelation function. If they come from different random processes, it will be 
called the crosscorrelation function. We will consider autocorrelation functions 
first. 

If X(t) is a sample function from a random process, and the random variables 
are defined to be 


X (11) 
X(t5) 


Xi 
X; 


then the autocorrelation function is defined to be 


Rx (ty, t) = E[X,X2] = h. dx, [. xXx f(%1, x2) dx; (6-1) 


This definition is valid for both stationary and nonstationary random processes. 
However, our interest is primarily in stationary processes, for which further sim- 
plification of (6—1) is possible. It may be recalled from the previous chapter that 
for a wide-sense stationary process all such ensemble averages are independent of 
the time origin. Accordingly, for a wide-sense stationary process, 


Ry (ti, [5) = Ry (ti T T, I» t T) 
E[X(t, + T)X(t; + T) 


Il 


Since this expression is independent of the choice of time origin, we can set T = 
—t, to give 


Rx (t, t2) = Rx(0, t2 — tı) = E[XO)X(h — tı)] 


It is seen that this expression depends only on the time difference t; — tı. Setting 
this time difference equal to 7 = t; — tı and suppressing the zero in the argument 
of R«(0, t — tj), we can rewrite (6—1) as 
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Ry(r) = E[X(t)X( + 7)] (6-2) 


This is the expression for the autocorrelation function of a stationary process and 
depends only on 7 and not on the value of t,. Because of this lack of dependence 
on the particular time ¢, at which the ensemble averages are taken, it is common 
practice to write (6—2) without the subscript; thus, 


Ry(T) = E[X(0OX(t + 7)] 


Whenever correlation functions relate to nonstationary processes, since they are 
dependent on the particular time at which the ensemble average is taken as well 
as on the time difference between samples, they must be written as Ry(t;,t;) or 
Rx(t;,7). In all cases in this and subsequent chapters, unless specifically stated 
otherwise, it is assumed that all correlation functions relate to wide-sense station- 
ary random processes. 

It is also possible to define a time autocorrelation function for a particular 
sample function as! 


T 
Rr) = lim 2 | x(t)x(t + 7) dt = ix(t)x(t + 7)) (6-3) 
oo 2T -F 


For the special case of an ergodic process, (x(Dx(t + 7)) is the same for every x(t) 
and equal to Ry(7). That is, 


R(t) = Ry(T) for an ergodic process (6-4) 


The assumption of ergodicity, where it is not obviously invalid, often simplifies 
the computation of correlation functions. 

From (6—2) it is seen readily that for 7 = 0, since Ry(O) = E[X(t,)X(t,)], the 
autocorrelation function is equal to the mean-square value of the process. For 
values of 7 other than 7 = 0, the autocorrelation function Ry(7) can be thought 
of as a measure of the similarity of the waveform X(t) and the waveform X(t + 
T). In order to illustrate this point further, let X(t) be a sample function from a 
zero-mean stationary random process and form the new function 


Y(t) = X(t) — pX(t + 7) 


By determining the value of p that minimizes the mean-square value of Y(t) we 
will have a measure of how much of the waveform X(t + 7) is contained in the 
waveform X(t). The determination of p is made by computing the variance of 
Y(t), setting the derivative of the variance with respect to p equal to zero, and 
solving for p. The operations are as follows: 


"The symbol ( ) is used to denote time averaging. 
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E(YO)) = E(X() — pX(t + Y] 
= E(X^(t) — 2pX()X(t + 7) + PX’ + 7) 
oy = ox — 2pRx(t) + play (6-5) 
da , 
7 = — 2Ry (T) + 200x = 0 
Ry(T 
p = 2 
Ux 


It is seen from (6—5) that p is directly related to Ry (7) and is exactly the corre- 
lation coefficient defined in Section 3-4. The coefficient p can be thought of as 
the fraction of the waveshape of X(t) remaining after 7 seconds have elapsed. It 
must be remembered that p was calculated on a statistical basis; and that it is the 
average retention of waveshape over the ensemble, and not this property in any 
particular sample function, that is important. As shown previously, the correlation 
coefficient p can vary from +1 to — 1. For a value of p = 1, the waveshapes 
would be identical—that is, completely correlated. For p — 0, the waveforms 
would be completely uncorrelated; that is, no part of the waveform X(t + 7) 
would be contained in X(t). For p = —1, the waveshapes would be identical, 
except for opposite signs; that is, the waveform X(t + 7) would be the negative 
of X(t). 

For an ergodic process or for nonrandom signals, the foregoing interpretation 
can be made in terms of average power instead of variance and in terms of the 
time correlation function instead of the ensemble correlation function. 

Since Ry (7) is dependent both on the amount of correlation p and the variance 
of the process, o^, it is not possible to estimate the significance of some partic- 
ular value of Ry(7) without knowing one or the other of these quantities. For 
example, if the random process has a zero mean and the autocorrelation function 
has a positive value, the most that can be said is that the random variables X(r,) 
and X(t; + 7) probably have the same sign.? If the autocorrelation function has a 
negative value, it is likely that the random variables have opposite signs. If it is 
nearly zero, the random variables are about as likely to have opposite signs as 
they are to have the same sign. 





Exercise 6—1.1 


A random process has sample functions of the form 
XH=A Osrtsl 
= 0 elsewhere 


This is strictly true only if f(x;) is symmetrical about the axis x, = 0. 


EXAMPLE: BINARY PROCESS 193 


where A is a random variable that is uniformly distributed from — 12 
to 12. Using the basic definition of the autocorrelation function as 
given by Equation (6-1), find the autocorrelation function of this pro- 
cess. 
Answer: 
Rx(t,, t) = 48 0zt,t:31 


= Q elsewhere 


Exercise 6—1.2 


Define a random variable Z(t) as 
Z(t) = X(t) + X(t + 7) 


where X(t) is a sample function from a stationary random process 
whose autocorrelation function is 


Rx(r) = exp (—7*) 
Write an expression for the autocorrelation function of the random pro- 
cess Z(t). 
Answer: 


Rz(r) = 2 exp (—7*) + exp [—(r — ^] + exp [- (r + 7] 





6—2 Example: Autocorrelation Function of a 
Binary Process 


The above ideas may be made somewhat clearer by considering, as a special 
example, a random process having a very simple autocorrelation function. Figure 
6-1 shows a typical sample function from a discrete, stationary, zero-mean ran- 
dom process in which only two values, +A, are possible. The sample function 
either can change from one value to the other every t, seconds or remain the same, 
with equal probability. The time f is a random variable with respect to the ensem- 
ble of possible time functions and is uniformly distributed over an interval of 
length tą. This means, as far as the ensemble is concerned, that changes in value 
can occur at any time with equal probability. It is also assumed that the value of 
X(t) in any one interval is statistically independent of its value in any other 
interval. 
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Figure 6-1 A discrete, stationary sample function. 


Although the random process described in the above paragraph may seem con- 
trived, it actually represents a very practical situation. In modern digital commu- 
nication systems, the messages to be conveyed are converted into binary symbols. 
This is done by first sampling the message at periodic time instants and then 
quantizing the samples into a finite number of amplitude levels as discussed in 
Section 2-7 in connection with the uniform probability density function. Each 
amplitude level is then represented by a block of binary symbols; for example, 
256 amplitude levels can each be uniquely represented by a block of 8 binary 
symbols. The binary symbols can in turn be represented by a voltage level of +A 
or —A. Thus, a sequence of binary symbols becomes a waveform of the type 
shown in Figure 6—1. Similarly, this waveform is typical of those found in digital 
computers or in communication links connecting computers together. Hence, the 
random process being considered here is not only one of the simplest ones to 
analyze, but is also one of the most practical ones in the real world. 

The autocorrelation function of this process will be determined by heuristic 
arguments rather than by rigorous derivation. In the first place, when |r| is larger 
than ¢,, then ¢; and 4, + 7 = ft cannot lie in the same interval, and X, and X, are 
statistically independent. Since X; and X; have zero mean, the expected value of 
their product must be zero, as shown by (3-22); that is, 


Rxy(r) = E[X\X2] = X,X2 = 0 Ir] > la 


since Xx, = X; = 0. When |r| is less than £,, then t; and f; + 7 may or may not 
be in the same interval, depending upon the value of ty. Since tọ can be anywhere, 
with equal probability, the probability that they do lie in the same interval is 
proportional to the difference between t, and 7. In particular, for 7 = 0, it is seen 
that to € fj S fj + T X fg + ta which yields t; + T — f£, < tọ S tj. Hence, 


Pr (t, and t; + 7 are in the same interval) 
= Pr[(tj + T — t; < to S tj)] 
LT 
ta 





l | 
uc IE Lor Ro 


a 
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R«(T) 


-t, 0 ta 
Figure 6-2 Autocorrelation function of the process in Figure 6-1. 
since the probability density function for f is just 1/t,. When 7 < 0, it is seen 
that (9 = f, + T S fj < fg + ta which yields t, — ta < fọ = t, + T. Thus, 


Pr (t, and ¢; + 7 are in the same interval) 
= Pr [t E ta) < lo = (ti T T)] 


i, TT 





l 
x: mi tT — (f —- = 


al ui 


Hence, in general, 
t, — [rl 


ES 


Pr (t, and t, + 7 are in same interval) = 


When they are in the same interval, the product of X, and X; is always A’; when 
they are not, the expected product is zero. Hence, 


Ry(r) = A' E - a = A? f - Hi 03 [rs t, 


la a 


= 0 Ir] > Ig 


(6-6) 


This function is sketched in Figure 6-2. 

It is interesting to consider the physical interpretation of this autocorrelation 
function in the light of the previous discussion. Note that when |r| is small (less 
than ¢,), there is an increased probability that X(¢,) and X(t; + 7) will have the 
same value, and the autocorrelation function is positive. When |r| is greater than 
ta it is equally probable that X(t,;) and X(t, + 7) will have the same value as that 
they will have opposite values, and the autocorrelation function is zero. For T 
= 0 the autocorrelation function yields the mean-square value of A?. 





Exercise 6—2.1 


A speech waveform is sampled 8000 times a second and each sam- 
ple is quantized into 128 amplitude levels. The resulting amplitude lev- 
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els are represented by a binary voltage having values of +2. Assum- 
ing that successive binary symbols are statistically independent, write 
the autocorrelation function of the binary process. 


Answer: 


1 
56,000 


z0 elsewhere 


Rx(7) = 4[1 — 56,000 |r|] oxd|z 





Exercise 6-2.2 





A sample function from a stationary random process is shown above. 
The quantity t, is a random variable that is uniformly distributed from 
0 to t; and the pulse amplitudes are +A with equal probability and are 
independent from pulse to pulse. Find the autocorrelation function of 
this process. 


Answer: 
b T 
= A? = — |- —b 
Rx(7) A te [1 ral I| 


= 0 lb 





6—3 Properties of Autocorrelation Functions 


If autocorrelation functions are to play a useful role in representing random pro- 
cesses and in the analysis of systems with random inputs, it is necessary to be 
able to relate the properties of the autocorrelation function to the properties of the 
random process it represents. In this section, a number of the properties that are 
possessed by all autocorrelation functions of stationary and ergodic random pro- 
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cesses are summarized. The student should pay particular attention to these prop- 
erties because they will come up many times in future discussions. 

1. Ry (0) = X?, Hence, the mean-square value of the random process can al- 
ways be obtained simply by setting 7 = 0. 

It should be emphasized that Rx(0) gives the mean-square value whether the 
process has a nonzero mean value or not. If the process is zero mean, then the 
mean-square value is equal to the variance of the process. 

2. Ry(r) = Ry(— 7). The autocorrelation function is an even function of 7. 

This is most easily seen, perhaps, by thinking of the time averaged autocorre- 
lation function, which is the same as the ensemble averaged autocorrelation func- 
tion for an ergodic random process. In this case, the time average is taken over 
exactly the same product function regardless of which direction one of the time 
functions is shifted. This symmetry property is extremely useful in deriving the 
autocorrelation function of a random process because it implies that the derivation 
needs to be carried out only for positive values of 7 and the result for negative 7 
determined by symmetry. Thus, in the derivation shown in the example in Section 
6-2, it would have been necessary to consider only the case for 7 = 0. For a 
nonstationary process, the symmetry property does not necessarily apply. 

3. |Rx (D| = Rx(0). The largest value of the autocorrelation function always 
occurs at 7 = 0. There may be other values of 7 for which it is just as big (for 
example, see the periodic case below), but it cannot be larger. This is shown 
easily by considering 


E[X, + X2)*] = EIX? + Xf + 2X,X2] = 0 
E[X,” T Xj] = 2Ry(0) = |E(2X,X>)| = I2Ry (7) 
and thus, 
Ry(0) = |Ry (7)| (6-7) 


4. If X(t) has a dc component or mean value, then Ry(7r) will have a constant 
component. For example, if X(t) = A, then 


Ry(t) = E[X(t)X(t) + 7] = EAA] = A? (6-8) 
More generally, if X(f) has a mean value and a zero mean component N(f) so that 


X(t) = X + N(0 


then 
Ry (r) = E([X + N()IIX + NG, + 7))} 
= E(X? + XN(n)  XN(t + 7) + NDN + 7)] (6-9) 
= (Xy + Ry(7) 
since 
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E[N()] = E[N(t, + 7] = 0 


Thus, even in this case, Ry (T) contains a constant component. 

For ergodic processes the magnitude of the mean value of the process can be 
determined by looking at the autocorrelation function as 7 approaches infinity, 
provided that any periodic components in the autocorrelation function are ignored 
in the limit. Since only the square of the mean value is obtained from this calcu- 
lation, it is not possible to determine the sign of the mean value. If the process is 
stationary, but not ergodic, the value of Ry(r) may not yield any information 
regarding the mean value. For example, a random process having sample func- 
tions of the form 


X(t) =A 


where A is a random variable with zero mean and variance c^, has an autocor- 
relation function of 


Ry (7) = o4* 


for all 7. Thus, the autocorrelation function does not vanish at 7 = © even though 
the process has zero mean. This strange result is a consequence of the process 
being nonergodic and would not occur for an ergodic process. 

5. If X(t) has a periodic component, then Ry (7) will also have a periodic com- 
ponent, with the same period. For example, let 


X(t) = A cos (wt + 6) 


where A and w are constants and 0 is a random variable uniformly distributed over 
a range of 27. That is, 


Il 

e 
IA 
eb 
IA 
ho 
>| 


f(@) 


I 
c 


elsewhere 
Then 


Ry(r) = E[A cos (ot; + 0) A cos (wt; + wr + 0)] 
| ag A? 
= E|-—cos(2o0t, + wr + 20) + E cos or 
2 2 | (6-10) 
aS pex 
= — | — [cos Qot, + wr + 20) + cos wr] dé 
2 Jo 2r 
A? 
= z COS (T 


In the more general case, in which 


X(t) = A cos (wt + 0) + NG) 
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where 0 and N(t,) are statistically independent for all t, by the method used in 
obtaining (5—9), it is easy to show that 
A? 

Rxy(7) = x cos wT + R(T) (6-11) 


Hence, the autocorrelation function still contains a periodic component. 

The above property can be extended to consider random processes that contain 
any number of periodic components. If the random variables associated with the 
periodic components are statistically independent, then the autocorrelation func- 
tion of the sum of the periodic components is simply the sum of the periodic 
autocorrelation functions of each component. This statement is true regardless of 
whether the periodic components are harmonically related or not. 

If every sample function of the random process is periodic and can be repre- 
sented by a Fourier series, the resulting autocorrelation is also periodic and can 
also be represented by a Fourier series. However, this Fourier series will include 
more than just the sum of the autocorrelation functions of the individual terms if 
the random variables associated with the various components of the sample func- 
tion are not statistically independent. A common situation in which the random 
variables are not independent is the case in which there is only one random vari- 
able for the process, namely a random delay on each sample function that is 
uniformly distributed over the fundamental period. 

6. If (X(1)) is ergodic and zero mean, and has no periodic components, then 


lim Ry(7) = 0 (6-12) 
Ir» 


For large values of 7, since the effect of past values tends to die out as time 
progresses, the random variables tend to become statistically independent. 

7. Autocorrelation functions cannot have an arbitrary shape. One way of spec- 
ifying shapes that are permissible is in terms of the Fourier transform of the au- 
tocorrelation function. That is, if 


oc 


F[Ry(7)] = [. Ry (r)e ^" dr 


then the restriction is 
S[Ry(7)] = 0 all w (6-13) 


The reason for this restriction will become apparent after the discussion of spectral 
density in Chapter 7. Among other things, this restriction precludes the existence 
of autocorrelation functions with flat tops, vertical sides, or any discontinuity in 
amplitude. 

There is one further point that should be emphasized in connection with auto- 
correlation functions. Although a knowledge of the joint probability density func- 
tions of the random process is sufficient to obtain a unique autocorrelation func- 
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tion, the converse is not true. There may be many different random processes that 
can yield the same autocorrelation function. Furthermore, às will be shown later, 
the effect of linear systems on the autocorrelation function of the input can be 
computed without knowing anything about the probability density functions. 
Hence, the specification of the correlation function of a random process is not 
equivalent to the specification of the probability density functions and, in fact, 
represents a considerably smaller amount of information. 





Exercise 6-3.1 


a) An ergodic random process has an autocorrelation function of 
the form 


Rx(r) = 25e ^" + 16 cos 207 + 36 


Find the mean-square value, mean value, and variance of this 
process. 


b) An ergodic random process has an autocorrelation function of 
the form 

257^ + 36 

6.257° + 4 

Find the mean-square value, mean value, and variance of this 

process. 


Rx (7) = 


Answers: +2, 5, +6, 9, 41, 77 


Exercise 6-3.2 


For each of the following functions of 7, determine the largest value of 
the constant A for which the function could be a valid autocorrelation 
function: 


a) 9e ^ - Ae ™ 
b) 10e 4-4 
c) 20cos 5r + Asin 57 


Answers: 0, 0, 6 
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6—4 Measurement of Autocorrelation 
Functions 


Since the autocorrelation function plays an important role in the analysis of linear 
systems with random inputs, an important practical problem is that of determining 
these functions for experimentally observed random processes. In general, they 
cannot be calculated from the joint density functions, since these density functions 
are seldom known. Nor can an ensemble average be made, because there is usu- 
ally only one sample function from the ensemble available. Under these circum- 
stances, the only available procedure is to calculate a time autocorrelation function 
for a finite time interval, under the assumption that the process is ergodic. 

In order to illustrate this, assume that a particular voltage or current waveform 
x(t) has been observed over a time interval from 0 to T seconds. It is then possible 
to define an estimated correlation function as for this particular waveform as 


R(T) = — | x(t)x(t + 7) dt Oar eT (6-14) 


Over the ensemble of sample functions, this estimate is a random variable denoted 
by Rx(r). Note that the averaging time is T — 7 rather than T because this is the 
only portion of the observed data in which both x(r) and x(t + 7) are available. 
In most practical cases it is not possible to carry out the integration called for 
in (6—14) because a mathematical expression for x(t) is not available. An alterna- 
tive procedure is to approximate the integral by sampling the continuous time 
function at discrete instants of time and performing the discrete equivalent to (6— 
14). Thus, if the samples of a particular sample function are taken at time instants 
of 0, At, 2At, . . . , Nåt, and if the corresponding values of x(t) are xo, Xi, x», 
. , Xy, the discrete equivalent to (6—14) is 


N—n 


X j n=0,1,2,...,M 


M<N 


R (nåt) — 


This estimate is also a random variable over the ensemble and, as such, is denoted 
by Ry(nAt). Since N is quite large (on the order of several thousand) this operation 
is best performed by a digital computer. 

In order to evaluate the quality of this estimate it is necessary to determine the 
mean and the variance of R,(nAr), since it is a random variable whose precise 
value depends upon the particular sample function being used and the particular 
set of samples taken. The mean is easy enough to obtain since 
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T | [ N-n 
E[Ry(nÀt)] = E n 2» X 
| N-n N-n 
Sie E[X,X, a Ry (nå 
N= 2, [ Xk+n] Nappi 2. x(n t) 


Ry (nM) 


Thus, the expected value of the estimate is the true value of the autocorrelation 
function and this is an unbiased estimate of the autocorrelation function. 
Although the estimate described by (6—15) is unbiased, it is not necessarily the 
best estimate in the mean-square error sense and is not the form that is most 
commonly used. Instead it is customary to use 
N-n 


S Oes —n20,,2,...,M (6-16) 


R (nåt) = waite 





This is a biased estimate, as can be seen readily from the evaluation of E [Rx (nAr)] 
given above for the estimate of (6—15). Since only the factor by which the sum is 
divided is different in the present case, the expected value of this new estimate is 
simply 


E(Ry (nAD)] = = cals (nAi) 


Note that if N > n, the bias is small. Although this estimate is biased, in most 
cases, the total mean-square error is slightly less than for the estimate of (6—15). 
Furthermore, (6—16) is slightly easier to calculate. A computer program for car- 
rying out this calculation is given in Appendix G. 

It is much more difficult to determine the variance of the estimate, and the 
details of this are beyond the scope of the present discussion. It is possible to 
show, however, that the variance of the estimate must be smaller than 


M 
Var [Rx (nA0] = R Po Ry? (kåt) (6-17) 


This expression for the variance assumes that the 2M + 1 estimated values of the 
autocorrelation function span the region in which the autocorrelation function has 
a significant amplitude. If the value of (2M + 1)At is too small, the variance 
given by (6—17) may be too small. If the mathematical form of the autocorrelation 
function is known, or can be deduced from the measurements that are made, a 
more accurate measure of the variance of the estimate is 


c 


Var [Ry (nAD] = - Í Ry) dr (6-18) 


where T = NAt is the length of the observed sample. 
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As an illustration of what this result means in terms of the number of samples 
required for a given degree of accuracy, suppose that it is desired to estimate a 
correlation function of the form shown in Figure. 6—2 with 4 points on either side 
of center (M = 4). If an rms error of 5 percent! or less is required, then (6-17) 
implies that (since t, = 4Af) 


"m kl Ac] 
0.054)? =— > a 
N k 5 ati - 4At | 


This can be solved for N to obtain 
N = 2200 


It is clear that long samples of data and extensive calculations are necessary if 
accurate estimates of correlation functions are to be made. 





Exercise 6—4.1 


An ergodic random process has an autocorrelation function of the 


form 
sin zT ? 
R(T) = 10 (27) 
TT 


a) Over what range of 7-values must the autocorrelation function 
of this process be estimated in order to include the first two 
zeros of the autocorrelation function? 


b) If 21 estimates (M — 20) of the autocorrelation function are to 
be made in the interval specified in (a), what should the sam- 
pling interval be? 


c) How many sample values of the random process are required 
so that the rms error of the estimate is less than 5 percent of 
the true maximum value of the autocorrelation function? 


Answers: 0.1, 2, 5331 


"This implies that the standard deviation of the estimate should be no greater than 5 percent of the true 
mean value of the random variable KR (nA). 
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Exercise 6—4.2 
Using the variance bounds given by the integral of (6—18), find the 
number of sample points required for the autocorrelation function es- 
timate of Exercise 6—4.1. 


Answer: 5333 





6—5 Examples of Autocorrelation Functions 


Before going on to consider crosscorrelation functions it is worthwhile to look at 
some typical autocorrelation functions, suggest the circumstances under which 
they might arise, and list possible applications. This discussion is not intended to 
be exhaustive but is intended primarily to introduce some ideas. 

The triangular correlation function shown in Figure 6—2 is typical of random 
binary signals in which the switching must occur at uniformly spaced time inter- 
vals. Such a signal arises in many types of communication and control systems in 
which the continuous signals are sampled at periodic instants of time and the 
resulting sample amplitudes converted to binary numbers. The correlation function 
shown in Figure 6—2 assumes that the random process has a mean value of zero, 
but this is not always the case. If, for example, the random signal could assume 
values of A and O0 (rather than — A) then the process has a mean value of A/2 and 
a mean-square value of A^/2. The resulting autocorrelation function, shown in 
Figure 6—3, follows from an application of (6-9). 

Not all binary time functions have triangular autocorrelation functions, how- 
ever. For example, another common type of binary signal is one in which the 
switching occurs at randomly spaced instants of time. If all times are equally 
probable, then the probability density function associated with the duration of each 


R(T) 





=i, 0 ta 


Figure 6-3 Autocorrelation function of a binary process with a non-zero mean 
value. 
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R(t) = A28 727 





(a) (b) 


Figure 6-4 (a) A binary signal with randomly spaced switching times and (b) the 
corresponding autocorrelation function. 


interval is exponential, as shown in Section 2—7. The resulting autocorrelation 
function is also exponential, as shown in Figure 6—4. The usual mathematical 
representation of such an autocorrelation function 1s 


Ry(r) = Ae% (6-19) 


where « is the average number of intervals per second. 

Binary signals and correlation functions of the type shown in Figure 6—4 fre- 
quently arise in connection with radioactive monitoring devices. The randomly 
occurring pulses at the output of a particle detector are used to trigger a flip-flop 
circuit that generates the binary signal. This type of signal is a convenient one for 
measuring either the average time interval between particles or the average rate of 
occurrence. It is usually referred to in the literature as the Random Telegraph 
Wave. 

Nonbinary signals can also have exponential correlation functions. For exam- 
ple, if very wideband noise (having almost any probability density function) is 
passed through a low pass RC filter, the signal appearing at the output of the filter 
will have a nearly exponential autocorrelation function. This result is shown in 
detail in Chapter 8. 

Both the triangular autocorrelation function and the exponential autocorrelation 
function share one feature that is worth noting. That is, in both cases the autocor- 
relation function has a discontinuous derivative at the origin. Random processes 
whose autocorrelation functions have this property are said to be nondifferentia- 
ble. A nondifferentiable process is one whose derivative has an infinite variance. 
For example, if a random voltage having an exponential autocorrelation function 
is applied to a capacitor, the resulting current is proportional to the derivative of 
the voltage, and this current would have an infinite variance. Since this doesn’t 
make sense on a physical basis, the implication is that random processes having 
truly triangular or truly exponential autocorrelation functions cannot exist in the 
real world. In spite of this conclusion, which is indeed true, both the triangular 
and exponential autocorrelation functions provide useful models in many situ- 
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(a) (b) 


Figure 6-5 The autocorrelation functions arising at the outputs of (a) a bandpass 
filler and (b) an ideal low pass filter. 


ations. One must be careful, however, that these models are not used in any 
situation in which the derivative of the random process is needed because the 
resulting calculation is almost certain to be wrong. 

All of the correlation functions discussed so far have been positive for all values 
of 7. This is not necessary, however, and two common types of autocorrelation 
functions that have negative regions are given by 


Ry (7) = A?e *"" cos Br (6-20) 
and 
A? sin m 
Ry(r) = ———— (6-21) 
TYT 


and are illustrated in Figure 6-5. The autocorrelation function of (6—20) arises at 
the output of the narrow band bandpass filter whose input is very wideband noise, 
while that of (6—21) is typical of the autocorrelation at the output of an ideal low- 
pass filter. Both of these results will be derived in Chapters 7 and 8. 

Although there are many other types of autocorrelation functions that arise in 
connection with signal and system analysis, the few discussed here are the ones 
most commonly encountered. The student should refer to the properties of auto- 
correlation functions discussed in Section 6—3 and verify that all these correlation 
functions possess those properties. 





Exercise 6—5.1 


a) Determine whether each of the random processes described by 
the autocorrelation functions of (6—20) and (6—21) are differ- 
entiable. 
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b) Indicate whether the following statement is true or false: The 
product of a function that is differentiable at the origin and a 
function that is nondifferentiable at the origin is always differ- 
entiable. Test your conclusion on the autocorrelation function 
of (6—20). 


Answers: Yes, yes, true 


Exercise 6-5.2 


Which of the following functions of 7 cannot be valid mathematical 
models for autocorrelation functions? Explain why. 

a) Bs 

b) [de^ ^^ 


c) 1080 "^-^ 





e) 44 





Answers: b,c, e are not valid models. 





6—6 Crosscorrelation Functions 


It is also possible to consider the correlation between two random variables from 
different random processes. This situation arises when there is more than one 
random signal being applied to a system or when one wishes to compare random 
voltages or currents occurring at different points in the system. If the random 
processes are jointly stationary in the wide sense, and if sample functions from 
these processes are designated as X(t) and Y(t), then for two random variables 

X, = X(t) 

Y; = Y(t, + 7) 


it is possible to define the crosscorrelation function 


Ryy(t) = E[XjY5] = |. dx, [. xy2f(x1, y2) dy; (6-22) 
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The order of subscripts is significant; the second subscript refers to the random 
variable taken at (t, + 7).* 

There is also another crosscorrelation function that can be defined for the same 
two time instants. Thus, let 


Y, = Y(t) 
X» = X (t, + T) 
and define 
Ryx(7) = E[Y;X7] = [. dy |. yfo X2) dx; (6-23) 


Note that because both random processes are assumed to be jointly stationary, 
these crosscorrelation functions depend only upon the time difference 7. 

It is important that the processes be jointly stationary and not just individually 
stationary. It is quite possible to have two individually stationary random pro- 
cesses that are not jointly stationary. In such a case, the crosscorrelation function 
depends upon time, as well as the time difference 7. 

The time crosscorrelation functions may be defined as before for a particular 
pair of sample functions as 

T 


R(T) = lim = QOO» + T7) dt (6-24) 
T— 2T 
and 
1 [T 
RAT) = m n 5T [. y(t)x(t + 7) dt (6—25) 


If the random processes are jointly ergodic, then (6—24) and (6—25) yield the same 
value for every pair of sample functions. Hence, for ergodic processes, 


R(T) = Rxy(7) (6-26) 
Ry AT) = Ryx (7) (6-27) 


In general, the physical interpretation of crosscorrelation functions is no more 
concrete than that of autocorrelation functions. It is simply a measure of how 
much these two random variables depend upon one another. In the later study of 
system analysis, however, the specific crosscorrelation function between system 
input and output will take on a very definite and important physical significance. 


“This is an arbitrary convention, which is by no means universal with all authors. The definitions 
should be checked in every case. 
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Exercise 6—6.1 
Two jointly stationary random processes have sample functions of the 
form 
X(t) = 5 cos(10r + 8) 
and 
Y(t) = 20 sin(10r + @) 
where @ is a random variable that is uniformly distributed from 0 to 27r. 
Find the crosscorrelation function Ry y(z7) for these two processes. 


Answer: 50 sini0r 


Exercise 6—6.2 


Two sample functions from two random processes have the form 
x(t) = 5 cosl0r 

and 
y(t) = 20 sinl0r 


Find the time crosscorrelation function for x(t) and y(t + 7). 


Answer: 50 sin107 





6—7 Properties of Crosscorrelation Functions 


The general properties of all crosscorrelation functions are quite different from 
those of autocorrelation functions. They may be summarized as follows: 


l. The quantities Ry,(0) and Ry4(0) have no particular physical significance 
and do not represent mean-square values. It is true, however, that Ryy(0) 
= Ryx(0). 

2. Crosscorrelation functions are not generally even functions of 7. There is a 
type of symmetry, however, as indicated by the relations 


Ryx(7) = Ryy( — 7) (6—28) 


This result follows from the fact that a shift of Y(t) in one direction (in 
time) is equivalent to a shift of X(r) in the other direction. 
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The crosscorrelation function does not necessarily have its maximum value 
at T = 0. It can be shown, however, that 


IR, (7)| = [Ry (O)R ,(0)] s (6-29) 


with a similar relationship for Ryy (7). The maximum of the crosscorrelation 
function can occur anywhere, but it cannot exceed the above value. Fur- 
thermore, it may not achieve this value anywhere. 

If the two random processes are statistically independent, then 


Ryy(r) = E[X;, Yo] = EIX JEY] = XY (6-30) 
= Ryy (7) 


If, in addition, either process has zero mean, then the crosscorrelation 
function vanishes for all 7. The converse of this is not necessarily true, 
however. The fact that the crosscorrelation function is zero and that one 
process has zero mean does not imply that the random processes are statis- 
tically independent, except for jointly Gaussian random variables. 

If X(t) is a stationary random process and X(t) is its derivative with respect 
to time, the crosscorrelation function of X(t) and X(t) is given by 


dRy (7) 
dt 
in which the right side of (6—31) is the derivative of the autocorrelation 


function with respect to 7. This is easily shown by employing the funda- 
mental definition of a derivative 


R(T) = (6-31) 


X(t + e) — X(t) 


X(t) = lim 
e e 


Hence, 


Il 


E[X(0X(t + 7)] 
| .. X(0X(t +r + e) — XXt + 2) 
E 4 lim ———— ———— ———————————— 


e-— E 


R(T) 


. Rx(T + e) — Rx(T) _ dRy (7) 
= lim ———————————— = ——— 
e—Ü € d(T) 
The interchange of the limit operation and the expectation is permissible 
whenever X(t) exists. If the above process is repeated, it is also possible to 
show that the autocorrelation function of X(t) 1s 


d’Rx (7) 


R(T) = Ky AT) == dr’ 


(6-32) 
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where the right side is the second derivative of the basic autocorrelation 
function with respect to 7. 

It is worth noting that the requirements for the existence of crosscorre- 
lation functions are more relaxed than those for the existence of autocor- 
relation functions. Crosscorrelation functions are generally not even func- 
tions of 7, their Fourier transforms do not have to be positive for all values 
of w, and it is not even necessary that the Fourier transforms be real. These 
latter two points are discussed in more detail in the next chapter. 





Exercise 6-7.1 


Prove the inequality shown in Equation (6—29). This is most easily 
done by evaluating the expected value of the quantity 


X ZEN 6 
i M 2 
E E Es 


Exercise 6-7.2 


Two random processes have sample functions of the form 
X(t) = A cos(ogt + 0) and Y(t) = B sin(wot + 0) 


where @ is a random variable that is uniformly distributed between 0 
and 27 and A and B are constants. 


a) Find the crosscorrelation functions Ryy(r) and Ry, (zr). 


b) What is the significance of the values of these crosscorrelation 
functions at 7 = 0? 


1 
Answer: (3 SIN@ pT 





6—8 Examples and Applications of 
Crosscorrelation Functions 


It is noted previously that one of the applications of crosscorrelation functions is 
in connection with systems with two or more random inputs. In order to explore 
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this in more detail, consider a random process whose sample functions are of the 
form 


A(t) = X(t) + Y(t) 


in which X(t) and Y(t) are also sample functions of random processes. Then defin- 
ing the random variables as 


Z| = X, + Y, X (ti) xus Y(t) 
Z> = X5 T Y; = X (t + T) + Y(t, + T) 


| 


+ 


the autocorrelation function of Z(t) is 

Rz(r) = E[ZiZ;] = EX, + Y) (X; + Y; 
E[X;X, + Y;Y, + X,Y + YiX;] (6-33) 
Ry (7) + Ry(7) € Ryy(7) € Ryx (7) 


Il 


This result is easily extended to the sum of any number of random variables. In 
general, the autocorrelation function of such a sum will be the sum of all the 
autocorrelation functions plus the sum of all the crosscorrelation functions. 

If the two random processes being considered are statistically independent and 
one of them has zero mean, then both of the crosscorrelation functions in (6-33) 
vanish and the autocorrelation function of the sum is just the sum of the autocor- 
relation functions. An example of the importance of this result arises in connection 
with the extraction of periodic signals from random noise. Let X(t) be a desired 
signal sample function of the form 


X(t) = A cos (wt + 0) (6—34) 


where @ is a random variable. It is shown previously that the autocorrelation func- 
tion of this process Is 


l 
Ry(T) = TA COS WT 


Next, let Y(t) be a sample function of zero-mean random noise that is statistically 
independent of the signal and specify that it has an autocorrelation function of the 
form 


Ry(r) = B'e "7 


The observed quantity is Z(t), which from (6—33) has an autocorrelation func- 
tion of 


Rz(T) 


Ry(7) + Ry(7) (6-35) 


I] . 
= "ug coswr + B^e ^"! 


EXAMPLES AND APPLICATIONS 213 





Figure 6-6 Autocorrelation function of sinusoidal signal plus noise. 


This function is sketched in Figure 6—6 for a case in which the average noise 
power, Y?, is much larger than the average signal power, $A’. It is clear from the 
sketch that for large values of 7, the autocorrelation function depends mostly upon 
the signal, since the noise autocorrelation function tends to zero as 7 tends to 
infinity. Thus, it should be possible to extract tiny amounts of sinusoidal signal 
from large amounts of noise by using an appropriate method for measuring the 
autocorrelation function of the received signal plus noise. 

Another method of extracting a small known signal from a combination of sig- 
nal and noise is to perform a crosscorrelation operation. A typical example of this 
might be a radar system that is transmitting a signal X(t). The signal that is re- 
turned from any target is a very much smaller version of X(t) and has been de- 
layed in time by the propagation time to the target and back. Since noise is always 
present at the input to the radar receiver, the total received signal Y(t) may be 
represented as 


Y(t) = aX(t — Ti) + NÀ (6-36) 


where a is a number very much smaller than 1, 7, is the round-trip delay time of 
the signal, and N(t) is the receiver noise. In a typical situation the average power 
of the returned signal, aX(t — 7,), is very much smaller than the average power 
of the noise, N(t). 

The crosscorrelation function of the transmitted signal and the total receiver 
input is 


Rxy(7) = E[X(0DY(t  7)] 
E[aX (t)X(t + T — 7T,) + X(ON(t + 7)] (6-37) 


aRx(T — Tı) + Rxw(7) 


Since the signal and noise are statistically independent and have zero mean (be- 
cause they are RF bandpass signals), the crosscorrelation function between X(f) 
and N(t) is zero for all values of 7. Thus, (6—37) becomes 
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Rxy(7) = aRy(t — Tı) (6—38) 


Remembering that autocorrelation functions have their maximum values at the 
origin, it is clear that if 7 is adjusted so that the measured value of Ry,(7) is a 
maximum, then 7 = 7, and this value indicates the distance to the target. 

In some situations involving two random processes it is possible to observe both 
the sum and the difference of the two processes but not each one individually. In 
this case, one may be interested in the crosscorrelation between the sum and dif- 
ference as a means of learning something about them. Suppose, for example, that 
we have available two processes described by 


U(t) = X(t) + Y(t) (6-39) 
V(t) = X(t) — Y(t) (6—40) 


in which X(t) and Y(r) are not necessarily zero mean nor statistically independent. 
The crosscorrelation function between U(r) and V(r) is 


Ryuy(T) 


E[U()V(t + 7)] 
EIX + Y(0](X(t + 7) — Ya + 7] (6-41) 
E[X()X(t + 7) + YX(t + 7) — X(OY(t + 7) — YOY + 7)] 


Each of the expected values in (6—41) may be identified as an autocorrelation 
function or a crosscorrelation function. Thus, 


Ryy(T) = Rx(t) + Ryy(7) — Rxy(7) — Ry(7) (6—42) 


In a similar way, the reader may verify easily that the other crosscorrelation func- 
tion is 


Ryy(t) = Rx(7) — Ryx(7) + Ryy(7) — Ry(7) (6-43) 


If both X and Y are zero mean and statistically independent, both crosscorrelation 
functions reduce to the same function, namely 


Ruy(t) = Ryv(t) = Rx(7r) — Ry(7) (6—44) 


The actual measurement of crosscorrelation functions can be carried out in 
much the same way as that suggested for measuring autocorrelation functions in 
Section 6—4. This type of measurement is still unbiased when crosscorrelation 
functions are being considered, but the result given in (6—17) for the variance of 
the estimate is no longer strictly true—particularly if one of the signals contains 
additive uncorrelated noise, as in the radar example just discussed. Generally 
speaking, the number of samples required to obtain a given variance in the esti- 
mate of a crosscorrelation function is much greater than that required for an au- 
tocorrelation function. 
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Exercise 6—8.1 


A random process has sample functions of the form X(t) = A in which 
A is a random variable that has a mean value of 10 and a variance of 
25. Sample functions from this process can be observed only in the 
presence of independent noise having an autocorrelation function of 


Ry(7) = 100 exp ( — 10J7]) 


a) Find the autocorrelation function of the sum of these two pro- 
cesses. 


b) If the autocorrelation function of the sum is observed, find the 
value of 7 at which this autocorrelation function is within 1% of 
its value at 7 — ». 


Answers: 0.439, 125 + 100 exp( —- 10|7) 


Exercise 6—8.2 


A random binary process such as that described in Section 6—2 has 
sample functions with amplitudes of +10 and ta = 0.01. It is applied 
to the half-wave rectifier circuit shown below. 


Ideal 
R, diode 
| » 





a) Find the autocorrelation function of the output, Ry(7). 
b) Find the crosscorrelation function Ryy(z). 


c) Find the crosscorrelation function Hyy (7). 


| i Irl 
Answers: 25 + 25(1 0.01 , 50[1 Iri] 
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6—9 Correlation Matrices for Sampled 
Functions 


The discussion of correlation thus far has concentrated on only two random vari- 
ables. Thus, for stationary processes the correlation functions can be expressed as 
a function of the single variable 7. There are many practical situations, however, 
in which there may be many random variables and it is necessary to develop some 
convenient method for representing the many autocorrelations and crosscorrela- 
tions that arise. The use of vector notation provides a convenient way of repre- 
senting a set of random variables, and the product of vectors that 1s necessary to 
obtain correlations results in a matrix. It is important, therefore, to discuss some 
situations in which the vector representation is useful and to describe some of the 
properties of the resulting correlation matrices. A situation in which vector nota- 
tion is useful in representing a signal arises in the case of a single time function 
that is sampled at periodic time instants. If only a finite number of such samples 
are to be considered, say N, then each sample value can become a component of 
an (N X 1) vector. Thus, if the sampling times are fj, £5, . . . , ty, the vector 
representing the time function X(r) may be expressed as 


X(t) 
X(t;) 
Pa ; 


X(ty) | 


If X(t) is a sample function from a random process, then each of the components 
of the vector X is a random variable. 

It is now possible to define a correlation matrix that is (N X N) and gives the 
correlation between every pair of random variables. Thus, 


KDX (t) XDA) + + + X(t))X(ty) 
XX (t) — X(t5)X(t5) 


Ry = E[XY'] = E 
X (tX) 2E X (tX (ty) 


where X" is the transpose of X. When the expected value of each element of the 
matrix is taken, that element becomes a particular value of the autocorrelation 
function of the random process from which X(t) came. Thus, 
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Rx(tj, ti) | Ry(ty, t2) * * 7 Ry(ty, ty) 
Rx(to, tj) Ry(t5, t5) 


Ry = l (6-45) 
Ry (ty, tı) eas Ry (ty, ty) 
When the random process from which X(t) came is wide-sense stationary, then 


all the components of Ry become functions of time difference only. If the interval 
between sample values is Aż, then 


t = t, + At 
t4 = fj + 2At 
ty=t +(N—-— DA 


and 


Ry[0] — Ry[Ar] - - : Ry{(N — 1) Ar] 
Rx[Ar] Rx[0] 


Ry = (6-46) 


Rx[(N — 1) Ar] tina Rx[0] 


where use has been made of the symmetry of the autocorrelation function; that is, 
Ry[; At] = Ry[—i At]. Note that as a consequence of the symmetry, Ry is a 
symmetric matrix (even in the nonstationary case), and that as a consequence of 
stationarity, the major diagonal (and all diagonals parallel to it) have identical 
elements. 

Although the Ry just defined is a logical consequence of previous definitions, it 
is not the most customary way of designating the correlation matrix of a random 
vector consisting of sample values. A more common procedure is to define a 
covariance matrix, which contains the variances and covariances of the random 
variables. The general covariance between two random variables is defined as 


E(X() — X@)UXG) — XAI} = copy (6-47) 
where X(t) = mean value of X(t) 
X(t) = mean value of X(t;) 
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c; = variance of X(t) 
cj = variance of X(tj) 
pi; = normalized covariance coefficient of X (rj) and X(t,) 


= 1, when: = j 
The covariance matrix is defined as 
Ay = E[X - X)! — X’) (6-48) 


where X is the mean value of X. Using the covariance definitions leads immedi- 
ately to 


oy P19 2P 12 "tt U4U0NDIN 
O201P21 02 | 
Ay = | (6-49) 
O NO PN] awe ON. 
since p; = 1, fori = 1,2, . . ., N. By expanding (6—49) it is easy to show 


that Ay is related to Ry by 
Ay = Ry — XX" (6-50) 


If the random process has a zero mean, then Ay = Ry. 

The above representation for the covariance matrix is valid for both stationary 
and nonstationary processes. In the case of a wide-sense stationary process, how- 
ever, all the variances are the same and the correlation coefficients in a given 
diagonal are the same. Thus, 


oj aw eo if @ 1,25. 005N 
Pij -— Di — ji | = k; B, AE N 
and 
d P$ p“ PN —1 
p l1 fi Pn-2 
p pi l m 
Ay =g] > - (6-51) 
| pi 
DN —1 p | 


Such a matrix is said to be Toeplitz. 
As an illustration of some of the above concepts, suppose we have a stationary 
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random process whose autocorrelation function is given by 
Rx(7) = 10e ^^ + 9 (6-52) 


To keep the example simple, assume that three random variables separated by one 
second are to be considered. Thus, N = 3 and Ar = 1. Evaluating (6-52) for 7 
= 0, 1, 2 yields the values that are needed for the correlation matrix. Thus, the 
correlation matrix becomes 


19 12.68 10.35 
Ry = | 12.68 19 12.68 
10.35 12.68 19 


Since the variance of this process is 10 and its mean value is +3, the covariance 
matrix is 
l 0.368 0.135 
Ax = 10 | 0.368 l 0.368 
0.135 0.368 l 


Another situation in which the use of vector notation is convenient arises when 
the random variables come from different random processes. In this case, the 
vector representing all the random variables might be written as 


X(t) 
Xx) 
XH = | | 
Xy(t) 
The correlation matrix is now defined as 
Ry(7) = E[X()X"(t + 7] (6-53) 


R(T) Ria(T) + + + Ryw(7) 
Ro\(7) RaT) +> 


Ryi(T) i des Ry(T) 
in which 
R(t) = E[X(0X;(t + 7)] 
Rift) = E[X&0X;(t + 7)] 


Note that in this case, the elements of the correlation matrix are functions of 7 
rather than numbers as they were in the case of the correlation matrix associated 
with samples taken from a single random process. Situations in which such a 
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correlation matrix might occur arise in connection with antenna arrays or arrays 
of seismic detectors. In such systems, the noise signals at each antenna element, 
or each seismic detector, may be from different, but correlated, random processes. 

Before we leave the subject of covariance matrices, it is worth noting the im- 
portant role that these matrices play in connection with the joint probability den- 
sity function for N random variables from a Gaussian process. It was noted earlier 
that the Gaussian process was one of the few for which it is possible to write a 
joint probability density function for any number of random variables. The deri- 
vation of this joint density function is beyond the scope of this discussion, but it 
can be shown that it becomes 


f(x) = fixit), x(2), -.. - » x(tn)] 


HS NI2 
(27) 


— 


| Do — 
Oma” o*P |-i (x! — YNA, '(x = »| (6-54) 
X 





. i ; d ee ee 
where |Ayx| is the determinant of Ay and Ay — is its inverse. 





Exercise 6—9.1 
A random process has an autocorrelation function of the form 


Ry(t) = 10e" cos 27 


Write the correlation matrix associated with four random variables de- 
fined for time instants separated by 0.5 seconds. 


Answers: Elements in the first row include 3.677, 2.228, 10.0, 6.064 


Exercise 6—9.2 
A covariance matrix for a stationary random process has the form 
1 06 04 — 
— 1 06 — 


04 06 — 0.6 
EC oet 


Fill in the blank spaces in this matrix. 


Answers: 1, 0.6, 0.2, 0.4 
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PROBLEMS 


6-1.1 A stationary random process having sample functions of X(t) has an au- 
tocorrelation function of 


Ry(r) = Se 5 
Another random process has sample functions of 


Y(t) = X(t) + bX(t — 0.1) 


a) Find the value of b that minimizes the mean-square value of Y(t). 
b) Find the value of the minimum mean-square value of Y(t). 


c) If |b] = 1, find the maximum mean-square value of Y(¢). 


6—1.2 For each of the autocorrelation functions given below, state whether the 
process if represents might be wide-sense stationary or cannot be wide- 
sense stationary. 

a) Ry (to t) = ele? 
b) Ry (t, b) = cost, cost; + sint, sint; 
c) Ry (ty, h) 


d) sinf} COS/; — COSI, sint 
By dh Bj maa 
th — f 


(r,2— 


e 152) 


6-2.1 Consider a stationary random process having sample functions of the form 
shown below: 





0 Eh 0 bj E 


At periodic time instants fg + nT, a rectangular pulse of unit height and 
width T, may appear, or not appear, with equal probability and indepen- 
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dently from interval to interval. The time fg is a random variable that is 
uniformly distributed over the period T and T, = T/2. 


a) Find the mean value and the mean-square value of this process. 


b) Find the autocorrelation function of this process. 


6-2.2 Find the time autocorrelation function of the sample function in Problem 
6-2.1. 


6-2.3 Consider a stationary random process having sample functions of the form 
X(t) = > A,g(t — to — nT) 
n= -%0 
in which the A, are independent random variables that are +1 or —1 with equal 


probability and fo is a random variable that is uniformly distributed over the period 
T. Define a function - 
G(r) = | 


.. gü)g(t + 7) dt 


and express the autocorrelation function of the process in terms of G(7). 


6-3.1 Which of the functions shown below cannot be valid autocorrelation func- 
tions? For each case explain why it is not an autocorrelation function. 





glr) gír) 
] ] 
-1 0 2 r -2 0 2 ú 
(a) (b) 
g(7) 
1 
‘ -2 0 2 | 
(c) (d) 


6-3.2 A random process has sample functions of the form 


6-3.3 


6-4.1 
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X(t) = Y cos (wt + 0) 
in which Y, «o, and 0 are statistically independent random variables. As- 
sume the Y has a mean value of 3 and a variance of 9, that 0 is uniformly 
distributed from — 7 to m, and that wọ is uniformly distributed from — 6 
to +6. 
a) Is this process stationary? Is it ergodic? 
b) Find the mean and mean-square value of the process. 


c) Find the autocorrelation function of the process. 


A stationary random process has an autocorrelation function of the form 


Ry(T) = 100e ^? cos277T + 10 cos67T + 36 


a) Find the mean value, mean-square value, and the variance of this 
process. 


b) What discrete frequency components are present? 


c) Find the smallest value of 7 for which the random variables X(t) and 
X(t + 7) are uncorrelated. 


Consider a function of 7 of the form 


V(T) 


I 
ae 
| 
N| 
eee 
E 
IA 
m] 


0 FT 


Take the Fourier transform of this function and show that it is a valid 
autocorrelation function only for T — 2. 


A stationary random process is sampled at time instants separated by 0.01 
seconds. The sample values are: 


k Xk k Xk k Yi 

0 0.19 7 -—L24 l4 1.45 
l 0.29 8 —1.88 15 —0.82 
2 1.44 9 —0.31 16. -0.25 
3 0.83 10 1.18 17 0.23 
4 —0.01 11 1.70 18 —0.91 
a 1.23 12 0.57 19 —0.19 
6 —1.47 13 0.95 20 0.24 
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a) Find the sample mean. 


b) Find the estimated autocorrelation function A(0.01 n) for n = 0, 1, 
2, 3 using Equation (6—15). 


c) Repeat (b) using Equation (6—16). 


a) For the data of Problem 6—4.1, find an upper bound on the variance 
of the estimated autocorrelation function using the estimated values 
of part (b). 


b) Repeat (a) using the estimated values of part (c). 


Assume that the true autocorrelation function of the random process from 
which the data of Problem 6—4.1 comes has the form 


Ra) = ali - Hl Ir =T 
| T 
and is zero elsewhere. 


a) Find the values of A and T that provide the best fit to the estimated 
autocorrelation function values of Problem 6—4.1(b) in the least- 
mean-square sense. (See Sec. 4—6.) 


b) Using the results of part (a) and Equation (6—18), find another upper 
bound on the variance of the estimate of the autocorrelation function. 
Compare with the result of Problem 6—4.2(a). 


A random process has an autocorrelation function of the form 
Ry(7) = 10e~ > cos207 


If this process is sampled every 0.01 seconds, find the number of samples 
required to estimate the autocorrelation function with a standard deviation 
that is no more than 1% of the variance of the process. 


Consider a random process having sample functions of the form shown in 
Figure 6—4(a) and assume that the time intervals between switching times 
are independent, exponentially distributed random variables. (see Sec. 2— 
7.) Show that the autocorrelation function of this process is a two-sided 
exponential as shown in Figure 6—4(b). 


Suppose that each sample function of the random process in Problem 6— 
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5.1 is switching between 0 and 2A instead of between +A. Find the 
autocorrelation function of the process now. 


Determine the mean value and the variance of each of the random pro- 
cesses having the following autocorrelation functions: 





a) 10e? 

b) 10e? cos2T T^ 
2 

C T -8 

10 = 
TA 


Consider a random process having an autocorrelation function of 


Ry(r) = 10e 7| — 54-4 


a) Find the mean and variance of this process. 


b) Is this process differentiable? Why? 


Two independent stationary random processes having sample functions of 
X(t) and Y(t) have autocorrelation functions of 


Ry(t) = 25e~ 91 cos100s7 
and 


sin5O7T 
SOTT 





Ry (T) = 16 


a) Find the autocorrelation function of X(r) + Y(t). 
b) Find the autocorrelation function of X(t) — Y(t). 


c) Find both crosscorrelation functions of the two processes defined by 
(a) and (b). 


d) Find the autocorrelation function of X(2)Y(t). 


For the two processes of Problem 6—7.1(c) find the maximum value that 
the crosscorrelation functions can have using the bound of Equation (6— 
29). Compare this bound with the actual maximum values that these cross- 
correlation functions have. 
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A stationary random process has an autocorrelation function of 


sinT 
Rx(T) = — 
z 


a) Find R;,(r). 


b) Find R;(7). 


Two stationary random processes have a crosscorrelation function of 
Rxy(r) = 16e €^? 

Find the crosscorrelation function of the derivative of X(t) and Y(t). That 

is, find R,,(7). 


A sinusoidal signal has the form 
X(t) = 0.01 sin(100r + 8) 


in which 0 is a random variable that is uniformly distributed between — 7 
and zr. This signal is observed in the presence of independent noise whose 
autocorrelation function is 


Ry(r) = 10e7 1 


a) Find the value of the autocorrelation function of the sum of signal 
and noise at 7 = 0. 


b) Find the smallest value of 7 for which the peak value of the autocor- 
relation function of the signal is ten times larger than the autocorre- 
lation function of the noise. 


One way of detecting a sinusoidal signal in noise is to use a correlator. 
In this device, the incoming signal plus noise is multiplied by a locally 
generated reference signal having the same form as the signal to be de- 
tected and the average value of the product is extracted with a low pass 
filter. Suppose the signal and noise of Problem 6—8.1 are multiplied by a 
reference signal of the form 


r(t) = 10 cos(100r + 4) 


ll 


The product is 


| 


Z(t) = r()X()  r()N() 
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a) Find the expected value of Z(t) where the expectation is taken with 
respect to the noise and @ is assumed to be a fixed, but unknown, 
value. 


b) For what value of œ is the expected value of Z(r) the greatest? 


Vibration sensors are mounted on the front and rear axles of a moving 
vehicle to pick up the random vibrations due to the roughness of the road 
surface. The signal from the front sensor may be modeled as 


f(t) = s(t) + nd) 


where the signal s(t) and the noise n,(t) are from independent random 
processes. The signal from the rear sensor is modeled as 


r(t) = s(t — Ti) + m(t) 


where m(t) is noise that is independent of both s(t) and n,(t). All pro- 
cesses have zero mean. The delay 7, depends upon the spacing of the 
sensors and the speed of the vehicle. 


a) If the sensors are placed 5 meters apart, derive a relationship between 
Tı and the vehicle speed v. 


b) Sketch a block diagram of a system that can be used to measure 
vehicle speed over a range of 5 meters per second to 50 meters per 
second. Specify the maximum and minimum delay values that are 
required if an analog correlator is used. 


c) Why is there a minimum speed that can be measured this way? 


d) If a digital correlator is used, and the signals are each sampled at a 
rate of 12 samples per second, what is the maximum vehicle speed 
that can be measured? 


The angle to distant stars can be measured by crosscorrelating the outputs 
of two widely separated antennas and measuring the delay required to 
maximize the crosscorrelation function. The geometry to be considered is 
shown on p. 228. In this system, the distance between antennas is nomi- 
nally 500 meters, but has a standard deviation of 0.01 meters. It is desired 
to measure the angle 0 with a standard deviation of no more than 1 mil- 
liradian for any 0 between 0 and 1.4 radians. Find an upper bound on the 
standard deviation of the delay measurement in order to accomplish this. 
Hint: Use the total differential to linearize the relation. 
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6-9.1 A stationary random process having an autocorrelation function of 


Ry(t) = 36e 7" coser 


is sampled at periodic time instants separated by 0.5 seconds. Write the 
covariance matrix for four consecutive samples taken from this process. 


6—9.2 A Gaussian random vector 


X| 
X = X; 
X3 
has a covariance matrix of 
1 0.5 0 
A [90.5 I 0.5 
0 0.5 l 


Find the expected value, E[X'A  ' X]. 


6—9.3 A transversal filter is a tapped delay line with the outputs from the various 
taps weighted and summed as shown below. 


X(t) 





¥(t) 


If the delay between taps is Ar the outputs from the taps can be expressed 
as a vector by 
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X(t) 
X(t — At) 


X(t) = 


X(t — NAb) 


Likewise, the weighting factors on the various taps can be written as a 
vector 


GN 


a) Write an expression for the output of the transversal filter, Y(r), in 
terms of the vectors X(t) and a. 


b) If X(t) is from a stationary random process with an autocorrelation 
function of Ry(7), write an expression for the autocorrelation func- 
tion Ry(T7). 


6-9.4 Let the input to the transversal filter of Problem 6—9.3 have an autocor- 
relation function of 


Ir| 
Ry =1- 4 < 
x (7) Ar Ir| = At 


and is zero elsewhere. 
a) If the transversal filter has 4 taps (i.e., N — 3) and the weighting 


factor for each tap is a; = 1 for all i, determine and sketch the 
autocorrelation function of the output. 


b) Repeat part (a) if the weighting factors are a; — 4 — i, i = 0, 
Io. 


References 


See the References for Chapter 1. Of particular interest for the material of this chapter are the 
books by Davenport and Root, Helstrom, and Papoulis. 


CHAPTER f . 


Spectral 
Density 


f—1 Introduction 


The use of Fourier transforms and Laplace transforms in the analysis of linear 
systems is widespread and frequently leads to much saving in labor. The principal 
reason for this simplification is that the convolution integral of time-domain meth- 
ods is replaced by simple multiplication when frequency-domain methods are 
used. 

In view of this widespread use of frequency-domain methods, it is natural to 
ask if such methods are still useful when the inputs to the system are random. The 
answer to this question is that they are still useful but that some modifications are 
required and that a little more care is necessary in order to avoid pitfalls. How- 
ever, when properly used, frequency-domain methods offer essentially the same 
advantages in dealing with random signals as they do with nonrandom signals. 

Before beginning this discussion, it is desirable to review briefly the frequency- 
domain representation of a nonrandom time function. The most natural represen- 
tation of this sort is the Fourier transform, which leads to the concept of frequency 
spectrum. Thus, the Fourier transform of some nonrandom time function, f(t), is 
defined to be 


F(w) = |. fe ^" dt (7-1) 


If f(t) is a voltage, say, then F(w) has the units of volts per rads/s and represents 
the relative magnitude and phase of steady-state sinusoids (of frequency œw) that 
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can be summed to produce the original f(t). Thus, the magnitude of the Fourier 
transform has the physical significance of being the amplitude density as a func- 
tion of frequency and as such gives a clear indication of how the energy of f(t) is 
distributed with respect to frequency. 

It might seem reasonable to use exactly the same procedure in dealing with 
random signals—that is, the use the Fourier transform of any particular sample 
function x(t), defined by 


oc 


F(w) = [. x(t)e ^^ dt 


as the frequency-domain representation of the random process. This is not possi- 
ble, however, for at least two reasons. In the first place, the Fourier transform 
will be a random variable over the ensemble (for any fixed c) since it will have a 
different value for each member of the ensemble of possible sample functions. 
Hence, it cannot be a frequency representation of the process, but only of one 
member of the process. However, it might still be possible to use this function by 
finding its expected value (or mean) over the emsemble if it were not for the 
second reason. The second, and more basic, reason for not using the F,(@) just 
defined is that—for stationary processes, at least—it almost never exists! It may 
be recalled that one of the conditions for a time function to be Fourier transform- 
able is that it be absolutely integrable; that is, 


| l Ix(t)|dt < o (7-2) 


This condition can never be satisfied by any nonzero sample function from a wide- 
sense stationary random process. The Fourier transform in the ordinary sense will 
never exist in this case, although it may occasionally exist in the sense of gener- 
alized functions, including impulses, and so forth. 

Now that the usual Fourier transform has been ruled out as a means of obtaining 
a frequency-domain representation for a random process, the next thought is to 
use the Laplace transform, since this contains a built-in convergence factor. Of 
course, the usual one-sided transform, which considers f(t) for t = O only, is not 
applicable for a wide-sense stationary process; however, this is no real difficulty 
since the two-sided Laplace transform is good for negative values of time as well 
as positive. Once this is done, the Laplace transform for almost any sample func- 
tion from a stationary random process will exist. 

It turns out, however, that this approach is not so promising as it looks since it 
merely transfers the existence problems from the transform to the inverse trans- 
form. A study of these problems requires a knowledge of complex variable theory 
that is beyond the scope of the present discussion. Hence, it appears that the 
simplest mathematically-acceptable approach is to return to the Fourier transform 
and employ an artifice that will insure existence. Even in this case it will not be 
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possible to justify rigorously all the steps, and a certain amount of the procedure 
will have to be accepted on faith. 


7T—2 Relation of Spectral Density to the 
Fourier Transform 


In order to use the Fourier transform technique it is necessary to modify the sam- 
ple functions of a stationary random process in such a way that the transform of 
each sample function exists. There are many ways in which this might be done, 
but the simplest one is to define a new sample function having finite duration. 
Thus, let 


Il 


X) =X hs T<% (7-3) 


0 I| > T 


and note that the truncated time function X7(t) will satisfy the condition of (7—2), 
as long as T remains finite, provided that the stationary process from which it is 
taken has a finite mean-square value. Hence, X7(t) will be Fourier transformable. 
In fact, X7(r) will satisfy the more stringent requirement for integrable square 
functions; that is 


[ Qd <% (7-4) 


This condition will be needed in the subsequent development. 
Since X;(f) is Fourier transformable, its transform may be written as 


Fy(w) = [. X7(ne "^ dt T « o (7-5) 


Eventually, it will be necessary to let T increase without limit; the purpose of the 
following discussion is to show that the expected value of |Fy(«)|^ does exist in 
the limit even though the Fy(@) for any one sample function does not. The first 
step in demonstrating this is to apply Parseval's theorem to X;(f) and Fy(@).' 
Thus, since x;(t) = O for |r| > T, 


T ga 
| X,*() dt = E | IFy(@)|? dw (7-6) 
-T àm J-a 


'Parseval’s theorem states that if f(t) and g(r) are transformable time functions with transforms of F(«) 
and G(w) respectively, then 


i fingi) dt = [. F(w)G(—w) dw 


ES 
2T 
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Note that |Fy(w)^ = Fy(@)Fy(—@) since Fy(— w) is the complex conjugate of 
Fy(w) when X,(f)is a real time function. 

Since the quantity being sought is the distribution of average power as a func- 
tion of frequency, the next step is to average both sides of (7—6) over the total 
time, 2T. Hence, dividing both sides by 2T gives 

l " lr, dt = l : 2 

oT LT (t) dt — tat -= IFx (w| dw (7-7) 
The left side of (7-7) is seen to be proportional to the average power of the sample 
function in the time interval from — T to T. More exactly, it is the square of the 
effective value of X7(t). Furthermore, for an ergodic process, this quantity would 
approach the mean-square value of the process as T approached infinity. 

However, it is not possible at this stage to let T approach infinity, since Fy(«) 
simply does not exist in the limit. It should be remembered, though, that Fy (c) 
is a random variable with respect to the ensemble of sample functions from which 
X(t) was taken. It is reasonable to suppose (and can be rigorously proved) that the 
limit of the expected value of (1/T)|Fy(@)|? does exist since the integral of this 
"always positive’ quantity certainly does exist, as shown by (7—4). Hence, taking 
the expectation of both sides of (7—7), interchanging the expectation and integra- 
tion and then taking the limit as T — ^ we obtain 


Zt z EN lta 
zr. E dps __ Fx (o) da | 


T x 


„ d E Il g l | 2 

lim >= |, X' dr = lim z J | EUFO hdo (7-8) 
=a 0 (^. AEF 
P-L [im HAOD 
EiS adap Po 


For a stationary process, the time average of the mean-square value is equal to 
the mean-square value and (7—8) can be written as 


; 1 f^. ElFx() 
Lf as SUES? 
irl p 5 


The integrand of the right side of (7-9), which will be designated by the symbol 
Sy (o), is called the spectral density of the random process. Thus, 


(7-9) 


2 
Sy(@) = lim El\Fx(@)\" 


7-10 


and it must be remembered that it is not possible to let T — © before taking the 
expectation. If X(f) is a voltage, say, then Sy(w) has the units of volts? per hertz, 
and its integral, as shown by (7—9), leads to the mean-square value; that is, 
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-— 1 (^ 
x= [. Sx (w) dw (7-11) 


The physical interpretation of spectral density can be made somewhat clearer 
by thinking in terms of average power, although this is a fairly specialized way 
of looking at it. If X(t) is a voltage or current associated with a 1 {2 resistance, 
then X? is just the average power dissipated in that resistance. The spectral den- 
sity, Sy (c), can then be interpreted as the average power associated with a band- 
width of 1 Hz centered at w27 Hz. [Note that the unit of bandwidth is the hertz 
(or cycle per second) and not the radian per second, because of the factor of 
1/(27r) in the integral of (7—11).] Because the relationship of the spectral density 
to the average power of the random process is often the one of interest, the spec- 
tral density is frequently designated as the '*power density spectrum. '' 

The spectral density defined above is sometimes referred to as the ''two-sided 
spectral density’’ since it exists for both positive and negative values of œw. Some 
authors prefer to define a ‘‘one-sided spectral density," which is usually expressed 
as a function of f = «w/23 and exists only for positive values of f. If this one- 
sided spectral density is designated by Gy (f), then the mean-square value of the 
random process is given by 


x -Í Gx (f) df (7-12) 


Since the one-sided spectral density is defined for positive frequencies only, it 
may be related to the two-sided spectral density by 


Gx(f) = 28sy2mf  fz0 (7-13) 
= 0 f«0 


Although both the one-sided spectral density and the two-sided spectral density 
are commonly used in the engineering literature, in the interest of consistency this 
text will use only the two-sided spectral density. However, the reader is cautioned 
that other references may use either and it is essential to be aware of the definition 
being employed. 

The foregoing analysis of spectral density has been carried out in somewhat 
more detail than is customary in an introductory discussion. The reason for this is 
an attempt to avoid some of the mathematical pitfalls that a more superficial ap- 
proach might gloss over. There is no doubt that this method makes the initial 
study of spectral density more difficult for the reader, but it is felt that the addi- 
tional rigor is well worth the effort. Furthermore, even if all of the implications 
of the discussion are not fully understood, it should serve to make the reader 
aware of the existence of some of the less obvious difficulties of frequency-do- 
main methods. 

Another approach to spectral density, which treats it as a defined quantity based 
on the autocorrelation function, is given in Section 7—6. From the standpoint of 
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application, such a definition is probably more useful than the more basic ap- 
proach given here and is also easier to understand. It does not, however, make 
the physical interpretation as apparent as the basic derivation does. 

Before turning to a more detailed discussion of the properties of spectral den- 
sities, it may be noted that in system analysis the spectral density of the input 
random process will play the same role as does the transform of the input in the 
case of nonrandom signals. The major difference is that spectral density represents 
a power density rather than a voltage density. Thus, it will be necessary to define 
a power transfer function for the system rather than a voltage transfer function. 





Exercise 7—2.1 


A stationary random process has two-sided spectral density given by 
Sx (c) 


ll 


lór  a-zj|e-zb 
= 0 elsewhere 


a) Find the mean-square value of this process if a = 0 and b = 2. 


b) Find the mean square value of this process if a = 2 and b = 3. 
Answers: 16, 32 


Exercise 7-2.2 


A stationary random process has a two-sided spectral density given 
by 
| 32 
xdg w + 16 
a) Find the average power (on a 1-2 basis) of this random pro- 
cess. 
b) Find the average power (on a 1-2 basis) associated with a 
range of w—values from —4 to +4. 


Answers: 2,4 
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7—3 Properties of Spectral Density 


Most of the important properties of spectral density are summarized by the simple 
statement that it is a real, positive, even function of w. It is known from the study 
of Fourier transforms that their magnitude is certainly real and positive. Hence, 
the expected value will also possess the same properties. 

A special class of spectral densities, which is more commonly used than any 
other, is said to be rational, since it is composed of a ratio of polynomials. Since 
the spectral density is an even function of w, these polynomials involve only even 
powers of w. Thus, it is represented by 

E Solo" + is. 2d ^ "ww ow EE dow” + do) 

Se) = w" + by, 207 ? + +++ + baw + by ens 
If the mean-square value of the random process is finite, then the area under 5, («) 
must also be finite, from (7—11). In this case, it is necessary that m > n. This 
condition will always be assumed here except for a very special case of white 
noise. White noise is a term applied to a random process for which the spectral 
density is constant for all w; that is, Sy(@) = So. Although such a process cannot 
exist physically (since it has infinite mean-square value), it is a convenient math- 
ematical fiction, which greatly simplifies many computations that would otherwise 
be very difficult. The justification and illustration of the use of this concept are 
discussed in more detail later. 

As an example of a rational spectral density consider the function 

16(w* + 12w" + 32) 
w? + 18H* + 92e? + 120 
I6(o^ + 4)(w* + 8) 


(o^ + 2w? + 6)(@* + 10) 


| 


Sy (w) 


Note that this function satisfies all of the requirements that spectral densities be 
real, positive, and even functions of w. In addition, the denominator is of higher 
degree than the numerator so that the spectral density vanishes at œ = ^c. Thus, 
the process described by this spectral density will have a finite mean-square value. 
The factored form of the spectral density is often useful in evaluating the integral 
required to obtain the mean-square value of the process. This operation is dis- 
cussed in more detail in a subsequent section. 

It is also possible to have spectral densities that are not rational. A typical 
example of this is the spectral density 





As is seen later, this is the spectral density of a random binary signal. 
Spectral densities of this type are continuous and, as such, cannot represent 
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random processes having dc or periodic components. The reason is not difficult to 
understand when spectral density is interpreted as average power per unit band- 
width. Any dc component in a random process represents a finite average power 
in zero bandwidth, since this component has a discrete frequency spectrum. Finite 
power in zero bandwidth is equivalent to an infinite power density. Hence, we 
would expect the spectral density in this case to be infinite at zero frequency but 
finite elsewhere; that is, it would contain a 6 function at « = 0. A similar argu- 
ment for periodic components would justify the existence of 6 functions at these 
discrete frequencies. A rigorous derivation of these results will serve to make the 
argument more precise and, at the same time, illustrate the use of the defining 
equation, (7-10), in the calculation of spectral densities. 

In order to carry out the desired derivation, consider a stationary random pro- 
cess having sample functions of the form 


X(t) = A + B cos(c,t + 0) (7-15) 


where A, B, and œw, are constants and @ is a random variable uniformly distributed 
from O0 to 277; that is, 


f(0) = — 0=0=27 


ll 
© 


elsewhere 


The Fourier transform of the truncated sample function, X7(1), is 


T 
Fy(w) = [. [A + B cos (eot + Oje ^ dt 


— jet B ee - 9*6 T e Jo2*ox 8 r 


e 





B 
—Jo -T 2 — Kurt c) -T 











+ = — 
r 2 jœ- e) 
Substituting in the limits and simplifying leads immediately to 


2A sin wT +B |^ sin (w — @),)T e /? sin (w + Qj)T 
(w — «) (w + @)) 


| on 


The square of the magnitude of Fy(«) will have nine terms, some of which are 
independent of the random variable 0 and the rest of which involve either e ^^ or 
e* In anticipation of the result that the expectation of all terms involving 6 
will vanish, it is convenient to write the squared magnitude in symbolic form 
without bothering to determine all the coefficients. Thus, 


4A? sin? wT sin? (o — @,)T . sin? (w + wT 
2 = + B? - A AN «3 E TEESE a PULL NNUS t BN 
Fx Go) wo (o — e» (o + w) 


+ C(o)e^ + C(—w)e "^ + Dewet?” + D- we "^ (7-17) 
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Now consider the expected value of any term involving 0. These are all of the 


form G(w)e/”"”, and the expected value is 
Ver nd 2m 
L 4j G J 
Go) | ME ag. EIS 
0 2r 





E[G(o)e"^| 





2m jn |o (7-18) 
= 0 n ul Euan 
Thus, the last four terms of (7-17) will vanish and the expected value will become 


> | sin? oT | sin? (@ — w)T . sin? (o + opT 
zin CD eg Uf a : O) 
[| x()|"] | a» (o = w)? (w + m (7-19) 





From (7—10), the spectral density is 


| auno m 
so = je [oer [ser] a Er ine car 
2 | (7-20) 


T=% | of (o — w)T 
, ET [sin (o + or i 
fa (m + w)T 


In order to investigate the limit, consider the essential part of the first term; that 


is, 
2 
sin wT 
lim T |———] =? 
fion ( wT ) 


This limit is clearly zero when w is not zero since sin? wT cannot exceed 1 and 
the denominator increases as T. When w = 0, however, 


sin cf 
wT 





w=0 





and the limit is infinite. Hence, one can write 


" z 
lim T (= er) = K8(w) (7-21) 
wT 


x 





where K represents the area of the 6 function and has not yet been evaluated. The 
value of K can be found by equating the areas of both sides of (7—21); that is, 


oZ . 2 uu 
T ] 
lim | T (n = dw = Í K8(w) do (7-22) 
— of — -a 


T— 





The integral on the left is tabulated and has a value of a for all values of T > 0. 
Thus, the limiting operation becomes trivial, and (7—22) leads to 


7T —K 


An exactly similar procedure can be used for the other terms in (7—20). It is 
left as an exercise for the reader to show that the final result becomes 
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Sy(w@) = 2mA?8(o) + 7 Bèlo yd 5 Bèlo +o) — (7-23) 


This spectral density is shown in Figure 7-1. 

It is of interest to determine the area of the spectral density in order to verify 
that (7-23) does, in fact, lead to the proper mean-square value. Thus, according 
to (7-11) 


l 20 
x? i| 2mA^Ó(v) + T B'&(o — w) + T Blw + w) | do 
2T J- 2 2 (7-24) 
l 25.7342, 9 2 2 | 2 
l— LIB.- = - 
7 [2m 2 B 7 B | A^ + 2 B 


The reader can verify easily that this same result would be obtained from the 
ensemble average of X^(t). 

A numerical example serves to illustrate discrete spectral densities. Suppose we 
have a stationary random process having sample functions of the form 


X(t) = 5 + 10 sin (6t + 0,) + 8 cos (12t + 0) 


in which 6, and @, are independent random variables and both are uniformly dis- 
tributed between 0 and 27r. Note that because the phases are uniformly distributed 
over 27r radians, there is no difference between sine terms and cosine terms and 
both can be handled with the results just discussed. This would not be true if the 
distribution of phases were not uniform over this range. Using (7—23), the spectral 
density of this process can be written immediately as 


Sx (w) 


2n(5)73(w) + KORO ai 4 5 (10*8(o + 6) 


" KORC — 12) + 5 (8)'8(o + 12) 


T|[508(w) + 506(w — 6) + 508(vo + 6) 
+ 328(vc — 12) + 328(m + 12)] 


Sx(o) 





Figure 7-1 Spectral density of dc and sinusoidal components. 
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The mean-square value of the process can be obtained from (7-24) as 


x? = (5y + (iav = pO = 107 


It is apparent from this example that finding the spectral density and mean-square 
value of random discrete frequency components is quite simple and straightfor- 
ward. 

It is also possible to have spectral densities with both a continuous component 
and discrete components. An example of this sort that arises frequently in connec- 
tion with communication systems or sampled data control systems is the random 
amplitude pulse sequence shown in Figure 7-2. It is assumed here that all of the 
pulses have the same shape but their amplitudes are random variables that are 
statistically independent from pulse to pulse. However, all the amplitude variables 
have the same mean, Y, and the same variance, ay. The repetition period for the 
pulses is f;, a constant, and the reference time for any sample function is ty, which 
is a random variable uniformly distributed over an interval of t. 

The complete derivation of the spectral density is too lengthy to be included 
here, but the final result indicates some interesting points. This result may be 
expressed in terms of the Fourier transform F(@) of the basic pulse shape /;(r), 
and is 


| 2 n2 - l 
Sx(w) = |F(o)f E +: a 2, ô (o = 2z») | (7-25) 
l l 


I H^ -a 


If the basic pulse shape is rectangular, with a width of t», the corresponding spec- 
tral density will be as shown in Figure 7—3. From (7—25) the following general 
conclusions are possible: 


l. Both the continuous spectrum amplitude and the areas of the 6 functions 
are proportional to the squared magnitude of the Fourier transform of the 
basic pulse shape. 


X(t) 







ya KE to— 2h) Yof(t — to) 


yif(t — to—t)) 






y-jf(t-- ty—t,) 






y2f(t — t5—2t) 





Figure 7-2 Random amplitude pulse sequence. 
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|S x( w) 





Figure 7-3 Spectral density for rectangular pulse sequence with random ampli- 
tudes. 


2. If the mean value of the pulse amplitude is zero, there will be no discrete 
spectrum even though the pulses occur periodically. 

3. If the variance of the pulse amplitude is zero, there will be no continuous 
spectrum. 


The above result is illustrated by considering a sequence of rectangular pulses 
having random amplitudes. Let each pulse have the form 


f(t) = 1 —0.01 = t = 0.01 
= 0 elsewhere 
and assume that these are repeated periodically every 0.1 second and have inde- 


pendent random amplitudes that are uniformly distributed between 0 and 12. The 
first step is to find the Fourier transform of the pulse shape. This is 


0.01 . 
-jt 4, sin0.01c 
,;; (De dt em 008 — 


F(w) = | 


Next we need to find the mean and variance of the random amplitudes. Since the 
amplitudes are uniformly distributed the mean value is 


. | 
Y = (3e + 12) 276 


and the variance is 


oy = (i)a — 0y = 12 


The spectral density may now be obtained from (7—25) as 
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2 æ 
sin0.01« 12 20(6) 27n 
0.02 — —— | | — + ——: I 
| 0.01c | Fe 0.017 Eoo amn) 


in0.0lo | E 
= E [oas +288 Y lw- 2007n) | 





Sy (w) 


0.01% 


n = %0 


Again it may be seen that there is a continuous part to the spectral density as well 
as an infinite number of discrete frequency components. 

Another property of spectral densities concerns the derivative of the random 
process. Suppose that X(t) = dX(t)/dt and that X(t) has a spectral density of 
Sx(œw), which was defined as 


(s EllFx Ó»[] 
3e) e amar 


The truncated version of the derivative, X4(0, will have a Fourier transform of 
jwFy(@), with the possible addition of two constant terms (arising from the dis- 
continuities at +7) that will vanish in the limit. Hence, the spectral density of 
the derivative becomes 


" E|| joF x (@)( — jo)Fx ( — |] 
ii UO————— 


S x (0) 


T> 2T (7-26) 
E[\Fy (o)? " 
= oF in PETRI = isst 


It is seen, therefore, that differentiation creates a new process whose spectral 
density is simply «^ times the spectral density of the original process. In this 
connection, it should be noted that if Sy («) is finite at œ = 0, then S,(@) will be 
zero at œ = 0. Furthermore, if Sy(@) does not drop off more rapidly than 1/w” 
as w — ©, then S\(@) will approach a constant at large œw and the mean-square 
value for the derivative will be infinite. This corresponds to the case of nondiffer- 
entiable random processes. 





Exercise 7—3.1 


A stationary random process has a spectral density of the form 
Sy(w) = 875(@) + 3675(@ — 16)  3670(w + 16) 
a) List all of the discrete frequencies present in this process. 


b) Find the mean value of the process. 
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c) Find the variance of the process. 


Answers: 0, +2, +16, 36 


Exercise 7—3.2 


A random process consists of a sequence of rectangular pulses hav- 
ing a duration of 1 millisecond and occurring every 5 milliseconds. The 
pulse amplitudes are independent random variables that are uniformly 
distributed between A and B. For each of the following sets of values 
for A and B, determine if the spectral density has a continuous com- 
ponent, discrete components, both, or neither. 


a A= -5,B=5 
b A=5,B = 15 
c A=8,B=8 
d A20,B-8 


Answers: Both, neither, discrete only, continuous only 





7—4 Spectral Density and the Complex 
Frequency Plane 


In the discussion so far, the spectral density has been expressed as a function of 
the real angular frequency «. However, for applications to system analysis, it is 
very convenient to express it in terms of the complex frequency s, since system 
transfer functions are more convenient in this form. This change can be made very 
simply by replacing jw with s. Hence, along the jw-axis of the complex frequency 
plane, the spectral density will be the same as that already discussed. 

The formal conversion to complex frequency representation is accomplished by 
replacing w by — Js or o» by — s^. The resulting spectral density should properly 
be designated as S, ( — js), but this notation is somewhat clumsy. Therefore, spec- 
tral density in the s-plane will be designated simply as Sy(s). It is evident that 
Sy(s) and Sy(@) are somewhat different functions of their respective arguments, 
so that the notation is symbolic rather than precise. 

For the special case of rational spectral densities, in which only even powers of 
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w occur, this substitution is equivalent to replacing w? by —s°. For example, 
consider the rational spectrum 


lO(c^ + 5) 


x) = Ts 10s + 24 


When expressed as a function of s, this becomes 


= = 10(—s? + 5) 
Sx(s) = Sy(—Js) = - 


—Ó—— rr — 7-27 
s — 10s + 24 Gant) 


Any spectral density can also be represented (except for a constant of propor- 
tionality) in terms of its pole-zero configuration in the complex frequency plane. 
Such a representation is often convenient in carrying out certain calculations, 
which will be discussed in the following sections. For purposes of illustration, 
consider the spectral density of (7-27). This may be factored as 


—10(s + V5)(s — V5) 
(s + 2)s — 2s + V6\(s — V6) 
and the pole-zero configuration plotted as shown in Figure 7—4. This plot also 
illustrates the important point that such configurations are always symmetrical 
about the jw axis. When the spectral density is not rational, the substitution is the 
same but may not be quite as straightforward. For example, the spectral density 
given by (7-25) could be expressed in the complex frequency plane as 


-mr ow 2 
Sy(s) = F(s)F(— s) se Y di > a(s = E (7-28) 
L 4 


li n= —% 


Sx(s) = 





where F(s) is the Laplace transform of the basic pulse shape f(t). 

In addition to making spectral densities more convenient for system analysis, 
the use of the complex frequency s also makes it more convenient to evaluate 
mean-square values. This application is discussed in the following section. 





Exercise 7—4.1 


A stationary random process has a spectral density of the form 


25(o^ + 16) 


SE IS C M MET MBPS T 
xO) = Fs 3407 + 225 


Find the pole and zero locations for this spectral density in the com- 
plex frequency plane. 


Answers: +3, +4, +5 
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Figure 7-4  Pole-zero configuration for a spectral density. 


Exercise 7—4.2 


A stationary random process has a spectral density of the form 


w” (w^ + 25) 


Oe) = Of ade" + abba E TRO 


a) Verify that this spectral density is positive for all values of w. 


b) Find the pole and zero locations for this spectral density in the 
complex frequency plane. 


Answers: 0, +3, +5, +2, +j5 





7—5 Mean-Square Values from Spectral 
Density 


It was shown in the course of defining the spectral density that the mean-square 
value of the random process was given by 


— 1 (^ 
X^ = — I Sy (w) dw (7-11) 
2m J-> 


Hence, the mean-square value is proportional to the area of the spectral density. 

The evaluation of an integral such as (7—11) may be very difficult if the spectral 
density has a complicated form or if it involves high powers of w. A classical 
way of carrying out such integration is to convert the variable of integration to a 
complex variable (by substituting s for jw) and then to utilize some powerful 
theorems concerning integration around closed paths in the complex plane. This 
is probably the easiest and most satisfactory way of obtaining mean-square values 
but, unfortunately, requires a knowledge of complex variables that the reader may 
not possess. The mechanics of the procedure is discussed at the end of this sec- 
tion, however, for those interested in this method. 
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An alternative method, which will be discussed first, is to utilize some tabulated 
results for spectral densities that are rational. These have been tabulated in general 
form for polynomials of various degrees and their use is simply a matter of sub- 
stituting in the appropriate numbers. The existence of such general forms is pri- 
marily a consequence of the symmetry of the spectral density. As a result of this 
symmetry, it is always possible to factor rational spectral densities into the form 

so = SOB m 
where c(s) contains the left-half-plane (Ihp) zeros, c( — s) the right-half-plane (rhp) 
zeros, d(s) the Ihp poles, and d( — s) the rhp poles. 

When the real integration of (7—11) is expressed in terms of the complex vari- 
able s, the mean-square value becomes 


— | d^ | (^ c(s) c( — s) 
Z=] S as = l| ————— d 7-30 

3s] J-j "Met e edo dls) on 
For the special case of rational spectral densities, c(s) and d(s) are polynomials in 
s and may be written as 


els) = Cp— 1S" | + cu 28" ? Ht cg 
d(s) = d,s" + d, ,s" | + +++ + do 


Some of the coefficients of c(s) may be zero, but d(s) must be of higher degree 
than c(s) and must not have any coefficients missing. 

Integrals of the form in (7—30) have been tabulated for values of n up to 10, 
although beyond n = 3 or 4 the general results are so complicated as to be of 
doubtful value. An abbreviated table is given in Table 7-1. 


Table 7-1. Table of Integrals. 
T 1 [^ d) a-s) ; i 

23j J-i dis) d( — s) 
c(s) = Cys" | + c, 35 3 + ++ + 0 
dis) = d,s" + d, s . +++ + d, 


2 





Co 
h = — 
! dod 
Em ci d, + cod; 
"^ 2dodids 
ia c; dod, T (c? = 2coC2)dod, + co dzd; 
y = A IT 


2dyd.(did; — dods) 
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As an example of this calculation, consider the spectral density 
— 8 c2 
xU e + 100 + 9 


When o is replaced by — js, this becomes 


-()-4) -($;-4) 


Sx) = 4— 1049  ($- 1X -9 


(7-31) 


This can be factored into 


E (s + 2)(— s + 2) 
Sx) = c DG + 3-5 + Ds + 3) Es 


from which it is seen that 
c(s) =s+2 
d(s) = (s + IXs + 3) = s* + 45 + 3 


This is a case in which n = 2 and 


c= 1 
Co = 2 
d, = 1 
d=4 
dy = 3 


From Table 7-1, /; is given by 


L = —————É = oO =: = 
: 2dod,d, 2(3)(4)(1) 24 24 
However, x = ],, so that 
5 
24 


The procedure just presented is a mechanical one and in order to be a useful tool 
does not require any deep understanding of the theory. Some precautions are nec- 
essary, however. In the first place, as noted above, it is necessary that c(s) be of 
lower degree than d(s). Second, it is necessary that c(s) and d(s) have roots only 
in the left half plane. Finally, it is necessary that d(s) have no roots on the jw- 
axis. 

In the example given above the spectral density is rational and, hence, does not 
contain any ô functions. Thus, the random process that it represents has a mean 
value of zero and the mean-square value that was calculated is also the variance 
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of the process. There may be situations, however, in which the continuous part of 
the spectral density is rational but there are also discrete components resulting 
from a nonzero mean value or from periodic components. In cases such as this, it 
is necessary to treat the continuous portion of the spectral density and the discrete 
portions of the spectral density separately when finding the mean-square value. 
An example will serve to illustrate the technique. Consider a spectral density of 
the form 
25(w* + 16) 


Sy(@) = 878(w) + 36m8(o — 16) + 36m8(o + 16) + ————;— — 
AEE ANE RYE ONE e Shae 375 


From the discussion in Section 7—3 and Equation (7—24), it is clear that the con- 
tribution to the mean-square value from the discrete components is simply 


- l 
X" eres = (+) (8m + 367 + 367) = 40 


Note that this includes a mean value of +2. The continuous portion of the spectral 

density may now be written as a function of s as 

25(—s* + 16) 

ie e a 
xt) = 4 5 345 + 225 

which, in factored form becomes 


TEP [56 + 4)lSC—s + 4] 
: ls + 3k Sits + ls + 3j] 
It is now clear that 
c(s) = 5(s + 4) = 5s + 20 
from which cg = 20 and c, = 5. Also 
d(s) = (s + 3s + 5) = s? + 8s + 15 


from which dy = 15, d, = 8, and d; = 1. Using the expression for /, in Table 
7—] yields 


z . X05 + Qo) 
m 2(15)(8)(1) 


Hence, the total mean-square value of this process is 


X? = 40 + 3.229 = 43.229 


= 3.229 


Since the mean value of the process is +2, the variance of the process becomes 
o% = 43.229 — (2)? = 39.229. 

It was noted previously that the use of complex integration provides a very 
general, and very powerful, method of evaluating integrals of the form given in 
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Figure 7-5  Pole-zero configuration for a spectral density. 


(7—30). A brief summary of the theory of such integration is given in Appendix 
I, and these ideas will be utilized here to demonstrate another method of evaluat- 
ing mean-square values from spectral density. As a means of acquainting the stu- 
dent with the potential usefulness of this general procedure, only the mechanics 
of this method are discussed. The student should be aware, however, that there 
are many pitfalls associated with using mathematical tools without having a thor- 
ough grasp of their theory. All students are encouraged to acquire the proper 
theoretical understanding as soon as possible. 

The method considered here is based on the evaluation of residues, in much the 
same way as is done in connection with finding inverse Laplace transforms. Con- 
sider, for example, the spectral density given above in (7-31) and (7—32). This 
spectral density may be represented by the pole-zero configuration shown in Fig- 
ure 7—5. The path of integration called for by (7—30) is along the jw-axis, but the 
methods of complex integration discussed in Appendix I require a closed path. 
Such a closed path can be obtained by adding a semicircle at infinity that encloses 
either the left half plane or the right half plane. Less difficulty with the algebraic 
signs is encountered if the left half plane is used, so the path shown in Figure 7- 
6 will be assumed from now on. In order for the integral around this closed path 
to the same as the integral along the j«w-axis, it is necessary for the contribution 
due to the semicircle to vanish as R — ©. For rational spectral densities this will 
be true whenever the denominator polynomial is of higher degree than the numer- 
ator polynomial (since only even powers are present). 





Figure 7-6 Path of integration for evaluating mean-square value. 
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A basic result of complex variable theory states that the value of an integral 
around a simple closed contour in the complex plane is equal to 277j times the 
sum of the residues at the poles contained within that contour (see (1-3), Appendix 
I). Since the expression for the mean-square value has a factor of l/(27j), and 
since the chosen contour completely encloses the left half plane, it follows that 
the mean-square value can be expressed in general as 


X? = X (residues at Ihp poles) (7-33) 


For the example being considered, the only Ihp poles are at — 1 and —3. The 
residues can be evaluated easily by multiplying Sy(s) by the factor containing the 
pole in question and letting s assume the value of the pole. Thus, 


B l _ —(s + 2)(s — 2) E 3 
EE Dl z - Ds + 3)\(s — zl... —. 16 
TRES |. —@ + 2s - 2) | 03 
K_, = [(s + Sy(S))],- -3 = z F Da — De = 5l... = 48 


From (7—33) it follows that 


3 3.5 7 
X => +> = 
I6 48 24 


na 


which is the same value obtained above. 

If the poles are not simple, the more general procedures discussed in Appendix 
I may be employed for evaluating the residues. However, the mean-square value 
is still obtained from (7—33). 





Exercise 7—5.1 


A stationary random process has a spectral density of 


16 
w + 13o* + 36 


a) Find the mean-square value of this process using the results of 
Table 7-1. 


b) Repeat using contour integration. 


Sx(@) = 


Answer: 4/15 
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Exercise 7—5.2 


In discussing the use of Table 7—1, the statement is made that the 
polynomial d(s) must not have any missing coefficients. 


a) Explain why this condition is necessary. 


b) Check your conclusion by first verifying that the following is a 
valid spectral density and then attempting to use the table to 
evaluate the mean-square value. 


wr + 1 


= ae A 





7—6 Relation of Spectral Density to the 
Autocorrelation Function 


The autocorrelation function is shown in Chapter 6 to be the expected value of 
the product of time functions. In this chapter, it has been shown that the spectral 
density is related to the expected value of the product of Fourier transforms. It 
would appear, therefore, that there should be some direct relationship between 
these two expected values. Almost intuitively one would expect the spectral den- 
sity to be the Fourier (or Laplace) transform of the autocorrelation function, and 
this turns out to be the case. 

We consider first the case of a nonstationary random process and then specialize 
the result to a stationary process. In (7—10) the spectral density was defined as 


2 
Sy(w) = lim ELF x (w) (7-10) 


T—* 2T 
where F,(«) is the Fourier transform of the truncated sample function. Thus, 
T " 
Fx (c) = [. XT(t)e "vw dt T «o (7-34) 
Substituting (7—34) into (7—10) yields 
| T , T | 
Sy(@) = un 7 E |. Xr (tye ian dt, [. X7(tye "^^ an| (7-35) 


since |Fy(w)|? = Fy(w)Fx(— w). The subscripts on t, and t; have been introduced 
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so that we can distinguish the variables of integration when the product of inte- 
grals is rewritten as an iterated double integral. Thus, write (7—35) as 


| 


Sx (c) = lim —E [. dt; i e "nt E P Xrti Xr) dt, 
to = (7-36) 


lim — Ay a. e 2) E[Xr(t)Xr(t)] dt, 
os 2T 


Moving the expectation operation inside the double integral can be shown to be 
valid in this case, but the details are not discussed here. 

The expectation in the integrand above is recognized as the autocorrelation 
function of the truncated process. Thus, 


E|Xr(tj))Xr(t2)] = Rx(t;, t2) Il lo] S T (7-37) 
= 0 elsewhere 


Making the substitution 
I5 =j =T 
dt = dr 


we can write (7—37) as 


i-th T 
Sy(w) = lim [ dr | em Rx (t, ti + 7) dti 


Too 2T T-t} 
when the limits on ¢, are imposed by (7—37). Interchanging the order of integra- 
tion and moving the limit inside the 7-integral gives 


Sy(@) = ime lim z[, Ry (tj, ti + 7) an} Jer dr (7—38) 


From (7—38) it is apparent that the spectral density is the Fourier transform of the 
time average of the autocorrelation function. This may be expressed in shorter 
notation as follows: 


Sy(w) = F(Ry(t, t + 0) (7-39) 


The relationship given in (7—39) is valid for nonstationary processes also. 
If the process in question is a stationary random process, the autocorrelation 
function is independent of time; therefore, 


(Rx (ty, f + T)! = Ry (T) 


Accordingly, the spectral density of a wide-sense stationary random process is just 
the Fourier transform of the autocorrelation function; that is, 
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Sx(w) = | Ry (ne "" dr 
F{Ry (7)) 


(7-40) 


The relationship in (7—40), which is known as the Wiener-Khinchine relation, 
is of fundamental importance in analyzing random signals because it provides the 
link between the time domain (correlation function) and the frequency domain 
(spectral density). Because of the uniqueness of the Fourier transform it follows 
that the autocorrelation function of a wide-sense stationary random process is the 
inverse transform of the spectral density. In the case of a nonstationary process, 
the autocorrelation function cannot be recovered from the spectral density—only 
the time average of the correlation function, as seen from (7—39). In subsequent 
discussions, we will deal only with wide-sense stationary random processes for 
which (7—40) is valid. 

As a simple example of this result, consider an autocorrelation function of the 
form 


R(T) = Ae?"  A-0,8»0 


The absolute value sign on 7 is required by the symmetry of the autocorrelation 
function. This function is shown in Figure 7—7 (a) and is seen to have a discontin- 
uous derivative at 7 = 0. Hence, it is necessary to write (7—40) as the sum of 
two integrals—one for negative values of 7 and one for positive values of 7. Thus, 


ü s 
Sy (w) = Í A ee I" dr + Í Ae” g^ dr 
efie) "a (B + jenr 


+ A — 7—41 
a 8 d uc 








o 


l l 2A 
AK | > Ft = PE 
B-jo B+ jo a + p° 


B — jo 


— 


Ry (7) 


JA 





0 ~ -8 9 $ 
(a) (b) 


Figure 7-7 Relation between (a) autocorrelation function and (b) spectral density. 
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This spectral density is shown in Figure 7-7 (b). 

In the stationary case it 1s also possible to find the autocorrelation function 
corresponding to a given spectral density by using the inverse Fourier transform. 
Thus, 


E o£ ! 
Rx (7) = - [. Sx (a)e"" dw (7-42) 


An example of the application of this result will be given in the next section. 

In obtaining the result in (7—41), the integral was separated into two parts be- 
cause of the discontinuous slope at the origin. An alternative procedure, which is 
possible in all cases, is to take advantage of the symmetry of autocorrelation 
functions. Thus, if (7—40) is written as 


uc 


Sx(w) = [ Ry(7)[cos wr — j sin wr] dr 


by expressing the exponential in terms of sines and cosines, it may be noted that 
Ry(T) sin cr is an odd function of 7 and, hence, will integrate to zero. On the 
other hand, Ry (7) cos wr is even, and the integral from — œ to + © is just twice 
the integral from 0 to + œ. Hence, 


Sy(w) = 2 | Ry (T) cos wrt dr (7-43) 


is an alternative form that does not require integrating over the origin. The corre- 
sponding inversion formula, for wide-sense stationary processes, is easily shown 
to be 


Ry(T) = - | Sy (w) cos wr dw (7-44) 


It was noted earlier that the relationship between spectral density and correlation 
function can also be expressed in terms of the Laplace transform. However, it 
should be recalled that the form of the Laplace transform used most often in 
system analysis requires that the time function being transformed be zero for neg- 
ative values of time. Autocorrelation functions can never be zero for negative 
values of 7 since they are always even functions of 7. Hence, it is necessary to 
use the two-sided Laplace transform for this application. The corresponding trans- 
form pair may be written as 


Sy(s) = f Ry(r)e " dr (7-45) 
and 
| (^ 
Rx(T) = — Sx(s)e" ds (7-46) 
2mj J-j” 
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Since the spectral density of a process having a finite mean-square value can have 
no poles on the jw-axis, the path of integration in (7—46) can always be on the 
j@-axis. 

The direct two-sided Laplace transform, which yields the spectral density from 
the autocorrelation function, is no different from the ordinary one-sided Laplace 
transform and does not require any special comment. However, the inverse two- 
sided Laplace transform does require a little more care so that a simple example 
of this operation is desirable. 

Consider the spectral density found in (7-41) and write it as a function of s as 


Cia -2AB |  — —2AB 
x) = Ft 6+ AG -B 


in which there is one pole in the left-half plane and one pole in the right-half 
plane. Because of the symmetry of spectral densities, there will always be as 
many rhp poles as there are lhp poles. A partial fraction expansion of the above 
expression yields 

LA anh 

s+B s-—B 

The inverse Laplace transform of the Ihp terms in any partial fraction expansion 
is usually interpreted to represent a time function that exists in positive time only. 


Hence, in this case we can interpret the inverse transform of the above function 
to be 





Sx(s) = 


€»Ae T T0 





A 
s+ 
Because we are dealing with an autocorrelation function here it is possible to 
use the property that such functions are even in order to obtain the value of the 
autocorrelation function for negative values of 7. However, it is useful to discuss 
a more general technique that can also be used for crosscorrelation functions, in 
which this type of symmetry does not exist. Thus, for those factors in the partial 
fraction expansion that come from rhp poles, it is always possible to (a) replace s 
by — s, (b) find the single-sided inverse Laplace transform of what is now an lhp 
function, and (c) replace 7 by —7. Using this procedure on the rhp factor above 
yields 
-A A 


—x— Ho stp 
Upon replacing 7 by — 7 yields 


€ Ae 7 
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Thus, the resulting autocorrelation function is 

Ry(t) = Ae" —e << 
which is exactly the autocorrelation function we started with. The technique illus- 
trated by this example is sufficiently general to handle transformations from spec- 


tral densities to autocorrelation functions as well as from cross-spectral densities 
(which are discussed in a subsequent section) to crosscorrelation functions. 


, " E OA g 


Exercise 7—6.1 


A stationary random process has an autocorrelation function of the 
form 


Ry(r) = 16e?" — 8e ^^ 


Find the spectral density of this process. 


768 


pula Dl ccm, gen 


Exercise 7—6.2 


A stationary random process has a spectral density of the form 


16 


HANS arg Dui +36 


Find the autocorrelation function of this process. 


Answer: = (156-2 - @ 3) 





7—f White Noise 


The concept of white noise was mentioned previously. This term is applied to a 
spectral density that is constant for all values of w; that is, Sy(w) = So. It is 
interesting to determine the correlation function for such a process. This is best 
done by giving the result and verifying its correctness. Consider an autocorrelation 
function that is a 6 function of the form 


Ry (7) = S90(7) 
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Using this form in (7—40) leads to 
Sx (œ) = [. Rx (r)e ^" dr = [ _ Sob(r)e ^" dr = So (7-47) 


which is the result for white noise. It is clear, therefore, that the autocorrelation 
function for white noise is just a 6 function with an area equal to the spectral 
density. 

It was noted previously that the concept of white noise is fictitious because such 
a process would have an infinite mean-square value, since the area of the spectral 
density is infinite. This same conclusion is also apparent from the correlation 
function. It may be recalled that the mean-square value is equal to the value of 
the autocorrelation function at 7 = 0. For a 6 function at the origin, this is also 
infinite. Nevertheless, the white-noise concept is an extremely valuable one in the 
analysis of linear systems. It frequently turns out that the random signal input to 
a system has a bandwidth that is much greater than the range of frequencies that 
the system is capable of passing. Under these circumstances, assuming the input 
spectral density to be white may greatly simplify the computation of the system 
response without introducing any significant error. Examples of this sort are dis- 
cussed in Chapters 8 and 9. 

Another concept that is frequently used is that of bandlimited white noise. This 
implies a spectral density that is constant over a finite bandwidth and zero outside 
this frequency range. For example, 


Sy(w) = So lo| = 27W (7-48) 
0 l| > 27 W 


as shown in Figure 7—8(a). This spectral density is also fictitious even though the 
mean-square value is finite (in fact, x? = 2WSo). Why? It can be approached 
arbitrarily closely, however, and is a convenient form for many analysis problems. 





Sx(o) 
90 
i) T 
— 27W 0 enW gat 7mes 7D 1 lo 3 
2W W 2W 2W Ww 2W 
(a) (b) 


Figure 7-8 Bandlimited white noise: (a) spectral density and (b) autocorrelation 
function. 
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The autocorrelation function for such a process is easily obtained from (7—42). 
Thus, 


= 2arW wo sel 
: | Fox far l " jer So € 
Ry (T) = a Sx (wie do = pm xn Soe dw = Pu TEZ 
ST J—9 To J-—207Ww ZH JT |a 
rW — j2aW 
So e TYT e fewer So 
— o = — sin 27WT 
ZT jT TT 
JWS sin 27 WT 
= ü 
| 2nWT 


This is shown in Figure 7—8(b). Note that in the limit as W approaches infinity 
this approaches a ó function. 

It may be observed from Figure 7—8(b) that random variables from a bandlim- 
ited process are uncorrelated if they are separated in time by any multiple of 1/2W 
seconds. It is known also that bandlimited functions can be represented exactly 
and uniquely by a set of samples taken at a rate of twice the bandwidth. This is 
the so-called sampling theorem. Hence, if a bandlimited function having a flat 
spectral density is to be represented by samples, it appears that these samples will 
be uncorrelated. This lack of correlation among samples may be a significant 
advantage in carrying out subsequent analysis. In particular, the correlation matrix 
defined in Section 6—9 for such sampled processes is a diagonal matrix; that is, 
all terms not on the major diagonal are zero. 





Exercise 7—7.1 
A stationary random process has a bandlimited white spectral density 
given by 
0.01 lw| = 1007 
0 le| > 1007 


Sx (w) 


a) Find the mean-square value of this process. 


b) Find the smallest value of 7 for which the autocorrelation func- 
tion is 0. 


c) Specify the bandwidth of this process in Hz. 


Answers: 0.01, 1, 50 
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Exercise 7—7.2 


It is also possible to have bandlimited white noise processes that are 
bandpass in nature. Consider such a process having a spectral den- 
sity of 
Sy(w) = 0.005 2007 = |e| = 2507 
= () elsewhere 
a) Sketch this spectral density. 
b) Specify the bandwidth of this process in Hz. 


c) Find the mean-square value of this process. 


Answers: 0.25, 25 





7—8  Cross-Spectral Density 


When two correlated random processes are being considered, such as the input 
and the output of a linear system, it is possible to define a pair of quantities known 
as the cross-spectral densities. For purposes of the present discussion, it is suffi- 
cient to simply define them and note a few of their properties without undertaking 
any formal proofs. 

If Fx (c) is the Fourier transform of a truncated sample function from one pro- 
cess and Fy(w) is a similar transform from the other process, then the two cross- 
spectral densities may be defined as 


E[|Fxy ( — &)Fy (c)] 
gi uA UP Y 


Sxy(@) = m OT (7-49) 
E|Fy(—w)F 
Syx(w) = lim E|Fy C7 @)Fx(@)] (7-50) 
Js AT 


Unlike normal spectral densities, cross-spectral densities need not be real, pos- 
itive, or even functions of w. They do have the following properties, however: 


l. Syy(w) = Syy*(o) (*implies complex conjugate) 
2. Re [Syy(w)] is an even function of œw. Also true for Syy (c). 


3. Im [Syy(w)] is an odd function of œw. Also true for Syy (c). 
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Cross-spectral densities can also be related to crosscorrelation functions by the 
Fourier transform. Thus for jointly stationary processes, 


Sxy(@) = [. Rxy(r)e "" dr (7-51) 
LI ; 

Ryy (7) = 2n [. Syy (we dw (7-52) 

Syx (w) = |. Ryx (r)e ^" dr (7-53) 
Lf i 

Ryx (T) = — | Syx(w)e”” dw (7-54) 
2T J-« 


It is also possible to relate cross-spectral densities and crosscorrelation functions 
by means of the two-sided Laplace transform, just as was done with the usual 
spectral density and the autocorrelation function. Thus, for jointly stationary ran- 
dom processes 


ob 


Sxy(s) = |. Roe dr 


l joo 
Ryy(7) = Znj [. Sxy(s)e” ds 


Syx(s) = [. Ryx(T)e " dr 


l je 
Ryx (7) = I. Syy (s)e" ds 


When using the two-sided inverse Laplace transform to find a crosscorrelation 
function, it is not possible to use symmetry to find the value of the crosscorrela- 
tion function for negative values of 7. Instead, the procedure discussed in Section 
7—6 must be employed. An example will serve to illustrate this procedure once 
more. Suppose we have a cross-spectral density given by 


96 


rio) = uL — Du +8 


Note that this spectral density is complex for most values of w. Also, from the 
properties of cross-spectral densities given above, the other cross-spectral density 
is simply the conjugate of this one. Thus, 


96 


XO) = E fw * 8 
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When Syy(@) is expressed as a function of s it becomes 


— 96 — 96 


Se) = 2595-8 GAE- D 


A partial fraction expansion yields 





$e) 16 16 
p = — = 
- s#+4 s-—2 
The lhp pole at s = —4 yields the positive 7 function 
16 





€» 16e T T0 
s+ 4 


In order to handle the rhp pole at s = 2, replace s by —s and recognize the 
inverse transform as 
— 16 iis 16 
2 @*2 








€ 16e 77 
If 7 is now replaced by — 7 and the two parts combined, the complete crosscor- 
relation function becomes 

Ryx(t) = 166"  -v»0 

= 16e” T«0 
The other crosscorrelation function can be obtained from the relation 
Ryx (7) = Ryy(C— 7) 

Thus, 

Ryy(7) = 160" 70 


= 16e” T«0 





Exercise 7—8.1 
For two jointly stationary random processes, the crosscorrelation func- 
tion is 
967 09 


= Q T«0 


Rxyy(7) 


a) Find the corresponding cross-spectral density. 
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b) Find the other cross-spectral density. 


9 


Answers: CAN is £5 


Exercise 7—8.2 


Two jointly stationary random processes have a cross-spectral density 
of 


l 


Syy = ÁO 
xy (w) =) + j2w + 1 


Find the corresponding crosscorrelation function. 


Answer: Te ' T0 





7—9 Measurement of Spectral Density 


When random phenomena are encountered in practical situations, it is often nec- 
essary to measure certain parameters of the phenomena in order to be able to 
determine how best to design the signal processing system. The case that is most 
easily handled, and the one that will be considered here, is that in which it can be 
assumed legitimately that the random process involved is ergodic. In such cases, 
it is possible to make estimates of various parameters of the process from appro- 
priate time averages. The problems associated with estimating the mean and the 
correlation function have been considered previously; it is now desired to consider 
how one may estimate the distribution of power throughout the frequency range 
occupied by the signal—that is, the spectral density. This kind of information is 
invaluable in many engineering applications. For example, knowing the spectral 
density of an unwanted or interfering signal often gives valuable clues as to the 
origin of the signal, and may lead to its elimination. In cases where elimination 
is not possible, knowledge of the power spectrum often permits design of appro- 
priate filters to reduce the effects of such signals. 

As an example of a typical problem of this sort, assume that there is available 
a continuous recording of a signal x(t) extending over the interval 0 = ¢ = T. The 
signal x(f) is assumed to be a sample function from an ergodic random process. It 
is desired to make an estimate of the spectral density Sy(w) of the process from 
which the recorded signal came. 
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It might be thought that a reasonable way to find the spectral density would be 
to find the Fourier transform of the observed sample function and let the square 
of its magnitude be an estimate of the spectral density. This procedure does not 
work, however. Since the Fourier transform of the entire sample function does 
not even exist, it is not surprising to find that the Fourier transform of a portion 
of that sample function is a poor estimator of the desired spectral density. This 
procedure might be possible if one could take an ensemble average of the squared 
magnitude of the Fourier transform of all (or even some of) the sample functions 
of the process but since only one sample function is available no such direct 
approach is possible. 

An alternative to the above is to employ the mathematical relationship between 
the spectral density and the autocorrelation function, as given by (7—40). Since it 
is possible to estimate autocorrelation functions from a single sample function, as 
discussed in Section 6—4, the Fourier transform of this estimate will be an esti- 
mate of the spectral density. It is this approach that will be discussed here. 

It is shown in (6—14) that an estimate of the autocorrelation function of an 
ergodic process can be obtained from 


Fas 


T-r 
Rx(7) = — | X(t)X(t + 7) dt Q=7 4T (6-14) 


when X(f) is an arbitrary member of the ensemble. Since 7 must be much smaller 
than 7, the length of the record, let the largest permissible value of 7 be desig- 
nated by Tm. Thus, Éy(r) has a value given by (6-14) whenever |r| = 7,, and is 
assumed to be zero whenever |r| > 7,,. A more general way of introducing this 
limitation on the size of 7 is to multiply (6-14) by an even function of 7 that is 
zero when |r| > Tm. Thus, define a new estimate of Ry(r) as 


wr) 
I-—T 
w(T), Ry (T) 





» Ry (T) 


F-—-T 
| X(t)X(t + 7T) dt 
" (7-55) 


where w(t) = O when |r| > Tm and is an even function of 7 and akx(7) is now 
assumed to exist for all 7. The function w(r) is often referred to as a ''lag win- 
dow’’ since it modifies the estimate of Ry(7) by an amount that depends upon the 
"lag" (that is, the time delay 7) and has a finite width of 27,,. The purpose of 
introducing w(7), and the choice of a suitable form for it, are extremely important 
aspects of estimating spectral densities that are all too often overlooked by engi- 
neers attempting to make such estimates. The following brief discussion of these 
topics is hardly adequate to provide a complete understanding, but it may serve to 
introduce the concepts and to indicate their importance in making meaningful es- 
timates. 

Since the spectral density is the Fourier transform of the autocorrelation func- 
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tion, an estimate of spectral density can be obtained by transforming (7—55). 
Thus, 


wsx (w) = Shw(r) Ry (7)] (7-56) 


= — W(w)*,5y(@) 
27 


where W(w) is the Fourier transform of w(7) and the symbol * implies convolution 
of transforms. 4$ (c) is the spectral density associated with , Ay (7), which is now 
defined for all 7 but cannot be estimated for all 7. 

In order to discuss the purpose of the window function, it is important to em- 
phasize that there is a particular window function present even if the problem is 
ignored. Since (6—14) exists only for |r| = 7,,, it would be equivalent to (7-55) 
if a rectangular window defined by 


| 


wT) l I| = Tm (7-57) 


0 Ir] > Ta 


were used. Thus, not assigning any window function is really equivalent to using 
the rectangular window of (7—57). The significance of using a window function 
of this type can be seen by noting that the corresponding Fourier transform of the 
rectangular window 1s 


$[w,(7)) = WAw) = 27, (mms) (7-58) 

QT, 
and that this transform is negative half the time, as seen in Figure 7—9. Thus, 
convolving it with ,5x(@) can lead to negative values for the estimated spectral 
density, even though 4$, (c) itself is never negative. Also, the fact that Åy (7) can 
be estimated for only a limited range of 7-values (namely, |r| = 7„) because of 


w,(7) 





; 0 Ts 
(a) (b) 


Figure 7-9 (a) The rectangular-window function and (b) its transform. 
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the finite length of the observed data (7,, < T) may lead to completely erroneous 
estimates of spectral density, regardless of how accurately Ay (7) is known within 
its range of values. 

The estimate provided by the rectangular window will be designated 


a l ^ 


It should be noted, however, that it is not found by carrying out the convolution 
indicated (7-59), since ,Sy(w) cannot be estimated from the limited data avail- 
able, but instead is just the Fourier transform of Ry(r) as defined by (6—14). That 
iS, 


Sx(@) = FRx (7)] (7-60) 
where 
1 IF 
Ry (T) = -——| X(t)X(t + 7T) dt QO=T=T, 
T —7T. 
= 0 T= Ta 

and 

Rx (7) = Ry (—7) T«0 


Thus, as noted above, ,Sy(w) is the estimate obtained by ignoring the conse- 
quences of the limited range of 7-values. The problem that now arises is how to 
modify ,Sy(@) so as to minimize the erroneous results that occur. It is this prob- 
lem that leads to choosing other shapes for the window function w(7). 


The source of the difficulty associated with „$ (w) is the sidelobes of W,(w). 
Clearly, this difficulty could be overcome by selecting a window function that has 
very small sidelobes in its transform. One such window function that has been 
used extensively is the so-called ‘‘Hamming window,’’ named after the man who 
suggested it, and given by 


0.54 + 0.46 cos — f< rpn 
Tm (7-61) 
= 0 Îr] > Ta 


Wj(T) 


This window and its transform are shown in Figure 7—10. 
The resulting estimate of the spectral density is given formally by 


l : 
hx (w) = 5 Walo)" Sx (c) (7-62) 
T 


but, as before, this convolution cannot be carried out because E x(«) is not avail- 
able. However, if it is noted that 
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Wy(T) Wn (o) 
1.08 Tm 
£u 


(b) 


Figure 7-10 (a) The Hamming-window function and (b) its Fourier transform. 


w,(T) = (oss + 0.46 cos z) w,(7) 


then it follows that 
Fiw] = Wie) 


[ostio + 0.23 fafo + £j t6 (o = =) |} *W w) 


since the nontruncated constant and cosine term of the Hamming window have 
Fourier transforms that are 6-functions. Substituting this into (7—62), and utilizing 
(7—59), leads immediately to 


noy (c) = 0.54, $, (w) + 0.23 E (o + z) T Kc (o p =) (7-63) 


Tm m 


Since ,$,(w) can be found, by using (7-60), it follows that (7—63) represents the 
modification of ,$,(«) that is needed to insure that the resulting estimate is always 
positive. 

In the discussion of estimating autocorrelation functions in Section 6—4, it was 
noted that in almost all practical cases the observed record would be sampled at 


discrete times of 0, Ar, 2Ar, . . . ,NAr and the resulting estimate formed by the 
summation: 
T N-n 
y(nåt) = M — md 2^ XXX, Ln n= G1... M (7-64) 


Since the autocorrelation function is estimated for discrete values of 7 only, it is 
necessary to perform a discrete approximation to the Fourier transform. Although 
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there are techniques for doing this that conserve computer time (the so-called 
**fast Fourier transform"), it is convenient in this discussion to consider the dis- 
crete version of (7-42), which is simply the Fourier cosine transform of the au- 
tocorrelation function. Thus, the estimated rectangular-window spectral density is 


M-1 
-5x(q Aw) = At Eo +2 > Ry(n Ad cos r3 + Ry(M At) cos am 
nl 
where q = 0,1,2, . . ., M (7-65) 
T 
digi 77-7 


The corresponding Hamming-window estimate is 
n$x(q Aw) = 0.54,$y (q Aw) + 0.226 S. [(q + 1) Ao] (7-66) 
+ ,Sx[(q — 1) Aol} 


and this represents the final form of the estimate. A computer program for carry- 
ing out a Hamming-window estimate of spectral density is given in Appendix G. 

In order to illustrate this method of estimating spectral density suppose that we 
have estimated the autocorrelation function of an ergodic random process, using 
(7-64) for M = 5 with Ar = 0.01. Let the resulting values of the estimated 
autocorrelation function be 


Rx(n At) 
10 
8 
6 
4 
2 
0 


Un 4 UT -— OCl!Iz 


For the specified values of M and At, the spacing between spectral estimates be- 
comes 


TT TT 


— x A: ^GX0.0D = 207 radians/second 


Aw 


Using the estimated values of autocorrelation the rectangular-window estimate of 
spectral density may be written from (7—65) as 


„x(q Aw) = 0.01 |o +2(8 cos( 4 z) +6 cos(2 z) 


+ 4 cos( 34 z) T2 cos( 4 z) | 
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This may be evaluated for values of q ranging from 0 to 5 and the resulting 
rectangular-window estimate is 


Sx (q Aw) 


0.5 
0.2094 
0 
0.0306 
0 
0.020 


The final Hamming-window estimate is found by using these values in (7—66). 
The resulting values are shown below. 


ARWNRK OLS 


rsx (q Aw) 


0.3664 
0.2281 
0.0552 
0.0165 
0.0116 
0.0108 


Although the length of the correlation function sample used is too short to give a 
very good estimate, this example does illustrate the method of applying a Ham- 
ming window and demonstrates the smoothing that such a window achieves. 

Many other window functions have been proposed for spectral estimation and 
some give better results than the Hamming window, although they may not be as 
easy to use. There is, for example, the Bartlett window, which is simply an isos- 
celes triangle, that can be applied very readily to the autocorrelation estimate, but 
requires that the actual convolution be carried out when applied to the spectral 
function. Another well-known window function is the **hanning window,’ which 
is a modified form of the Hamming window. Both of these window functions are 
considered further in the exercises and problems that follow. 

Although the problem of evaluating the quality of spectral density estimates is 
very important, it is also quite difficult. In the first place, Hamming-window es- 
timates are not unbiased, that is, the expected value of the estimate is not the true 
value of the spectral density. Secondly, it is very difficult to determine the vari- 
ance of the estimate, although a rough approximation to this variance can be 
expressed as 


UA 4» UC I o— OFS 


M 
Var [px (qAo)] = N Sx (qÀo) (7-67) 


when 2MAt is large enough to include substantially all of the autocorrelation func- 
tion. 
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When the spectral density being measured is quite nonuniform over the fre- 
quency band, the Hamming-window estimate may give rise to serious errors that 
can be minimized by “‘whitening’’—that is, by modifying the spectrum in a 
known way to make it more nearly uniform. A particularly severe error of this 
sort arises when the observed data contains a dc component, since this represents 
a ô function in the spectral density. In such cases, it is very important to remove 
the dc component from the data before proceeding with the analysis. 





Exercise 7—9.1 


A stationary random process having an autocorrelation function of 
sin ug 
10077 
has its autocorrelation function estimated over a range of |r| = 0.04. 


If a rectangular-window function of this width is used, determine the 
estimated spectral density at «o = 0 and œw = 1007. 


Rx(7) = i( 


Answers: 0.05, 0.1 


Exercise 7—9.2 


The Bartlett-window function is defined by 


Spl 


wy(T) = 1 lr| = Tm 


Tm 
= () y e ue 


Find the Fourier transform, W,(w), of this window function. 


Answer (m2) 
o ta 1 
WTml2 





T—10 Examples and Applications of Spectral 
Density 


The most important application of spectral density is in connection with the anal- 
ysis of linear systems having random inputs. However, since this application is 
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x(t) 






totti tyt2t, 





(a) (b) 


Figure 7-11 A binary signal with (a) rectangular pulses and (b) raised cosine 
pulses. 


considered in detail in the next chapter, it will not be discussed here. Instead, 
some examples are given that emphasize the properties of spectral density and the 
computational techniques. 

The first example considers the signal in a binary communication system—that 
is, one in which the message is conveyed by the polarities of a sequence of pulses. 
The obvious form of pulse to use in such a system is the rectangular one shown 
in Figure 7-11(a). These pulses all have the same amplitude, but the polarities 
are either plus or minus with equal probability and are independent from pulse to 
pulse. However, the steep sides of such pulses tend to make this signal occupy 
more bandwidth than is desirable. An alternative form of pulse is the raised-cosine 
pulse as shown in Figure 7—11(b). The question to be answered concerns the 
amount by which the bandwidth is reduced by using this type of pulse rather than 
the rectangular one. 

Both of the random processes described above have spectral densities that can 
be described by the general result in (7—25). In both cases, the mean value of 
pulse amplitude is zero (since each polarity is equally probable), and the variance 
of pulse amplitude is A? for the rectangular pulse and B^ for the raised-cosine 
pulse. (See the discussion of delta distributions in Section 2—7.) Thus, all that is 
necessary is to find |F(w)|* for each pulse shape. 

For the rectangular pulse, the shape function f(t) is 


fy 

= == 
fo =1 Jl 2 
ty 

= 0 ice 
[EL 


Hence, its Fourier transform is 


r2 sin (wt,/2) 
(w) = | E x m—f————— 
Pin i ere = B (wt ,/2) 
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and, from (7—25), the spectral density of the binary signal is 


wen] 


7 
wt,/2 ( — 


Sx(v) = A’ t | 


which has a maximum value at w = 0. 
For the raised-cosine pulse the shape function is 


l 2t 
5 ( + cos) li = 2,/2 


fo 


= 0 |z| > t,/2 


The Fourier transform of this shape function becomes 


F(w) 3M | + cos 2) oa 
(w d cos " e 


_ t | sin(ot,/2) T. 
— 2] (tD Ir — (ot,/2) 


and the corresponding spectral density is 


Selo) =< Ea [simon]? = 
a Y wt ,/2 T^ — (wt,/2)° lla 


which has a maximum value at w = 0. 

In evaluating the bandwidths of these spectral densities, there are many differ- 
ent criteria that might be employed. However, when one wishes to reduce the 
interference between two communication systems it is reasonable to consider the 
bandwidth outside of which the signal spectral density is below some specified 
fraction (say, 1 percent) of the maximum spectral density. That is, one wishes to 
find the value of w, such that 


Sx (c) 
— = 0.01 w| > w 
Since sin (wt,/2) can never be larger than 1, this condition will be assured for (7— 
68) when 
2 
A^ | —— 
mr 
— —5:25:0.01 
A*t 
from which 
20 


272 CHAPTER 7 SPECTRAL DENSITY 


for the case of rectangular pulses. When raised-cosine pulses are used, this con- 
dition becomes 


> = U.U 


This leads to 


It is clear that the use of raised-cosine pulses, rather than rectangular pulses, has 
cut the bandwidth almost in half (when bandwidth is specified according to this 
criterion). 

Almost all of the examples of spectral density that have been considered 
throughout this chapter have been low-pass functions—that is, the spectral density 
has had its maximum value at œ = 0. However, many practical situations arise 
in which the maximum value of spectral density occurs at some high frequency, 
and the second example will illustrate a situation of this sort. Figure 7-12 shows 
a typical band-pass spectral density and the corresponding pole-zero configuration. 
The complex frequency representation for this spectral density is obtained easily 
from the pole-zero plot. Thus, 


gsn DA O 
: (s + a + jas + a — jeoK(s — a + jeoK(s — a — joo) (779) 
= Sos’ 


— [Ks + a) + ey] — a. + ey] 


where 5, is a scale factor. Note that this spectral density is zero at zero frequency. 





(a) (b) 


Figure 7-12 (a) A band-pass spectral density and (b) the corresponding pole-zero 
plot. 
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The mean-square value associated with this spectral density can be obtained by 
either of the methods discussed in Section 7—5. If Table 7—1 is used, it is seen 
readily that 


c(s) = s c, = 1 Co = 0 
d(s)=s+lasta@et+or ad=1 d= d=a’ + a) 


The mean-square value is then related to /, by 


= c do + co d; (Da? + e) + 0 
= igh = E T So 
2do d, d; 2(a@ + Gp )2a)yX1) 
EE 
4a 


An interesting result of this calculation is that the mean-square value depends only 
upon the bandwidth parameter œ and not upon the center frequency wy. 

The third example concerns the physical interpretation of spectral density as 
implied by the relation 


l prr 
Y= = [ Sy (c) dw 


2 

Although this expression only relates the total mean-square value of the process 
to the total area under the spectral density, there is a further implication that the 
mean-square value associated with any range of frequencies is similarly related to 
the partial area under the spectral density within that range of frequencies. That 
is, if one chooses any pair of frequencies, say w, and œw, the mean-square value 
of that portion of the random process having energy between these two frequen- 


cles is 
Y 2 |J Sy(w) dw + | - Sy (a) do. 
= Gs č 


2n 


Il 


(7-71) 
i [9 
TT Jan 


I 


The second form in (7-71) is a consequence of Sy(w) being an even function 
of w. 

As an illustration of this concept, consider again the spectral density derived in 
(7-41). This was 


2AB 


Sy (w) = o! 4 B 


(7-41) 
where A is the total mean-square value of the process. Suppose it is desired to 
find the frequency above which one-half the total mean-square value (or average 
power) exists. This means that we want to find the œw; (with w = œ) for which 
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1 {7 2AgB LI3 [^ 248 l 
= — dn SS = ss dai | = —À 
T Ja, 0^ + B 27 w + B 2 


Thus, 


| dw EN. 
w wo” + p 4B 


since the A cancels out. The integral becomes 
| an^! 2 | 
(danti e 

B 





from which 


and 


Thus, one-half of the average power of this process occurs at frequencies above 
B and one-half below 6. Note that in this particular case, B is also the frequency 
at which the spectral density is one-half of its maximum value at œw = O0. This 
result is peculiar to this particular spectral density and is not true in general. For 
example, in the band-limited white-noise case shown in Figure 7—8, the spectral 
density reaches one-half of its maximum value at œw = 27W, but one-half of the 
average power occurs at frequencies greater than œ = mW. These conclusions are 
obvious from the sketch. 





Exercise 7—10.1 
An nth-order Butterworth spectrum is one whose spectral density is 
given by 


l 


3x00 = Ty N 
in which W is the so-called half-power bandwidth. 


a) Find the bandwidth outside of which the spectral density is less 
than 1 percent of its maximum value. 
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b) Forn = 1, find the bandwidth outside of which no more than 1 
percent of the average power exists. 


Answers: 2zW(99)'?", 400W 


Exercise 7—10.2 


Suppose that the binary communication system discussed in this sec- 

tion uses triangular pulses instead of rectangular or raised-cosine 
pulses. Specifically, let 

2t 
jl 


0 I] > £2 


] — 


I] = t2 


f(t) 








Find the bandwidth of this signal using the same criterion as used in 
the example. 


Answer: 12.56/t, 





PROBLEMS 


7-1.1 A sample function from a random process has the form 
XÐ) =M | ST 
and is zero elsewhere. The random variable M is uniformly distributed 
between —6 and 18. 
a) Find the mean value of the random process. 
b) Find the Fourier transform of this sample function. 
c) Find the expected value of the Fourier transform. 


d) What happens to the Fourier transform as T approaches infinity? 


7-2.] a) Use Parseval's theorem to evaluate the following integral: 


f sin 4w | / sin 8w 4 
— 0 de) | Bc) i 
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7-2.2 


7-2.3 


7-3.1 
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b) Use Parseval's theorem to evaluate 


E l 
= 
ls F Fa 
A stationary random process has a spectral density of 


Sy(w) = 1 — lel lw] = 8v 
TT 


= 0 elsewhere 
Find the mean-square value of this process. 
A random process with a spectral density of Sy(w) has a mean-square 


value of 4. Find the mean-square values of random processes having 
each of the following spectral densities: 


a) 4Sx(w) 
b) Sx(4o) 
c) Sy(@/4) 
d) 4S,(4o) 


For each of the following functions of «, state whether it can or cannot 
be a valid expression for the spectral density of a random process. If it 
cannot be a spectral density, state why not. 


l 


a) w + 3w + 1 
2 

| + 16 

b) — 7 

w + Iw + 18 

C) 10e - ^ 
w + 4 

d -3 


| 2 
e) I — cos o 
EF. 





7-3.2 


7-3.3 


7-3.4 
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A stationary random process has sample functions of the form 

X(t) = M + 5 cos (10r + 6,) + 10 sin (St + 0;) 
where M is a random variable that is uniformly distributed between — 3 
and +9, and 0, and 0, are random variables that are uniformly distrib- 
uted between 0 and 27. All three random variables are mutually inde- 
pendent. 
a) Find the mean value of this process. 
b) Find the variance of the process. 


c) Find the spectral density of the process. 


A stationary random process has a spectral density of 
Sy(@) = 3276 (w) + 876 (w — 6) + 876 (w + 6) 

+ 3276 (w — 12) + 3276 (o + 12) 
a) Find the mean value of this process. 
b) Find the variance of this process. 


c) List all discrete frequency components of the process. 





t — 0.1 E EO brO 


In the random pulse sequence shown above, pulses may occur or not 
occur with equal probability at periodic time intervals of 0.1 second. The 
reference time to for any sample function is a random variable uniformly 
distributed over the interval of 0.1 second. 

a) Find the mean value of this process. 


b) Find the variance of this process. 


c) Find the spectral density of the process. 
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A stationary random process has a spectral density of 


a) 
b) 
c) 
d) 


16 (w^ + 36) 


Sx (o) = w + 13e + 36 


Write this spectral density as a function of the complex frequency s. 
List all of the pole and zero frequencies. 
Find the value of the spectral density at a frequency of 1 Hz. 


Suppose this spectral density is to be scaled in frequency so that its 
value at zero frequency is unchanged but its value at 100 Hz is the 
same as it previously had at 1 Hz. Write the new spectral density 
as a function of s. 


7—4.2 A given spectral density has a value of 10 V?/Hz at zero frequency. Its 
zeros in the complex frequency plane are at +5 and its poles are at +2 
+75 and +6 2:3. 


7-5.1 


7-5.2 


7-5.3 


a) 
b) 
c) 
a) 


b) 


a) 


b) 


Write the spectral density as a function of s. 
Write the spectral density as a function of o. 


Find the value of the spectral density at a frequency of 1 Hz. 


Find the mean-square value of the spectral density in Problem 
7—3.1(a). 


Find the mean-square value of the spectral density in Problem 
7—3.1(d). 


Find the mean-square value of the spectral density in Problem 7—3.2 
using Table 7-1. 


Repeat part (a) using contour integration in the complex frequency 
plane. 


Find the mean-square value of a stationary random process whose spec- 
tral density is 


— 5 


a o 
x6) = a — 529 4 576 


7-5.4 


7-6.1 


7-6.2 


7-6.3 


7—6.4 
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Find the mean-square value of a stationary random process whose spec- 


tral density is 
E xs ee ee ey — 3) + 278 ( 3) 
Sa? + 4 e TÒ (w TÒ (w + 


A stationary random process has an autocorrelation function of 


|z| 
10|1 — —- <0. 
| 20 |r| = 0.05 


zs elsewhere 


Ry (7) 


a) Find the variance of this process. 
b) Find the spectral density of this process. 


c) State the relation between the frequencies, in Hz, at which the spec- 
tral density is zero and the value of 7 at which the autocorrelation 
function goes to zero. 


A stationary random process has an autocorrelation function of 


Ry (7) = 16e ^?" cos 20z7 + 8 cos 1077 


a) Find the variance of this process. 
b) Find the spectral density of this process. 


c) Find the value of the spectral density at zero frequency. 


A stationary random process has a spectral density of 
Sy(@) = 5 10 = |w| = 20 


zo elsewhere 


a) Find the mean-square value of this process. 
b) Find the autocorrelation function of this process. 


c) Find the value of the autocorrelation function at 7 = 0. 


A nonstationary random process has an autocorrelation function of 


Ry (t, t + 7) = 8e ^?" (cos 2078) 


a) Find the spectral density of this process. 





280 


T—T.1 


7-7.2 


7-8.1 
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b) Find the autocorrelation function of the stationary random process 
that has the same spectral density. 


A stationary random process has a spectral density of 
9 


ME EFA 


a) Write an expression for the spectral density of bandlimited white 
noise that has the same value at zero frequency and the same mean- 
square value as the above spectral density. 


b) Find the autocorrelation function of the process having the original 
spectral density. 

c) Find the autocorrelation function of the bandlimited white noise of 
part (a). 


d) Compare the values of these two autocorrelation functions at 7 = 
0. Compare the areas of these two autocorrelation functions. 


A stationary random process has a spectral density of 
Sy(w) = 0.01 le| = 10007 


= 0 elsewhere 


a) Find the autocorrelation function of this process. 
b) Find the smallest value of 7 for which the autocorrelation is zero. 


c) Find the correlation between samples of this process taken at a rate 
of 1000 samples/second. Repeat if the sampling rate is 1500 sam- 
ples/second. 


A stationary random process with sample functions X(t) has a spectral 
density of 
16 
5 = =- 
x (0) w + 16 
and an independent stationary random process with sample functions Y(t) 
has a spectral density of 
2 


@ 
19) ET 


7-8.2 


7-9.1 


7-9.2 


1—9.3 


7—10.1 
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A new random variable is formed from U(t) = X(t) + Y(t). 


a) Find the spectral density of U(t). 
b) Find the cross-spectral density Syy(@). 


c) Find the cross-spectral density Syy(w). 


For the two random processes of Problem 7—8.1, a new random process 
is formed from V(t) = X(t) — Y(t). Find the cross-spectral density 
Suv(@). 


Refer to the estimated autocorrelation function from the data given in 
Problem 6—4.1. 


a) Using the estimated autocorrelation function of part (a), compute the 
corresponding rectangular-window estimate of the spectral density 
for q = 0, 1, 2, 3. 

b) Using the results of part (a), find the Hamming-window estimate of 
the spectral density. 


c) Determine the approximate variance of the estimated spectral den- 
sity for q = 0. 


One of the earliest window functions employed for smoothing spectra is 
the so-called *‘hanning window’”’ defined as 


TT 
w(r) = 0.5 + 0.5 cos (=) Ir| 5 Tm 
Tm 
= 0 elsewhere 
a) Derive an expression for the hanning-window estimate that is similar 
to (7—66) for the Hamming window. 


b) Compare the sidelobe levels for the Hamming window and the han- 
ning window. 


Using the same data as in Problem 7-9.1, find the hanning-window es- 
timate of the spectral density. 


Consider a binary communication system using raised-cosine pulses de- 
fined by 
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7-10.3 
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l 
f(t) = z0 + cos mtiti) — |t — £t 


and zero elsewhere. Note that these pulses are twice as wide as those 
shown in Figure 7—11(b), but that the message bit duration is still 1. 
Thus, the pulses overlap in time, but that at the peak of each pulse, all 
earlier and later pulses are zero. The objective of this is to reduce the 
bandwidth still further. 


a) Wirite the spectral density of the resulting sequence of pulses. 


b) Find the value of «, such that the spectral density is less than 1 
percent of the maximum spectral density for all higher frequencies. 


c) What can you conclude about the bandwidth of this communication 
system as compared to the ones discussed in Section 7—10? 


A stationary random process has a spectral density having poles in the 
complex frequency plane located at s = +10 + 100. 


a) Find the half-power bandwidth in Hz of this spectral density. Half- 
power bandwidth is simply the frequency increment between fre- 
quencies at which the spectral density is one-half of its maximum 
value. 


b) Find the bandwidth between frequencies at which the spectral den- 
sity is 1 percent of its maximum value. 


A binary communication system using rectangular pulses is transmitting 
messages at a rate of 2400 bits/second. Determine the approximate fre- 
quency below which 90% of the average power is contained. 


A spectral density having an nth order synchronous shape is of the form 


l 


Sx(@) 7 TF CoB T 


a) Express the half-power bandwidth (in Hz) of this spectral density in 
terms of B,. 


b) Find the value of frequency above which the spectral density is al- 
ways less than 1 percent of its maximum value. 
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CHAPTER $4 


Response of Linear 
Systems to 
Random Inputs 





8—1 Introduction 


The discussion in the preceding chapters has been devoted to finding suitable 
mathematical representations for random functions of time. The next step is to see 
how these mathematical representations can be used to determine the response, or 
output, of a linear system when the input is a random signal rather than a deter- 
ministic one. 

It is assumed that the student is already familiar with the usual methods of 
analyzing linear systems in either the time domain or the frequency domain. These 
methods are restated here in order to clarify the notation, but no attempt is made 
to review all of the essential concepts. The system itself is represented either in 
terms of its impulse response h(t) or its system function H(w), which is just the 
Fourier transform of the impulse response. It is convenient in many cases also to 
use the transfer function H(s), which is the Laplace transform of the impulse 
response. In most cases the initial conditions are assumed to be zero, for conve- 
nience, but any nonzero initial conditions can be taken into account by the usual 
methods if necessary. 

When the input to a linear system is deterministic, either approach will lead to 
a unique relationship between the input and output. When the input to the system 
is a sample function from a random process, there is again a unique relationship 
between the excitation and the response; however, because of its random nature, 
we do not have an explicit representation of the excitation and, therefore, cannot 
obtain an explicit expression for the response. In this case we must be content 
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with either a probabilistic or a statistical description of the response, just as we 
must use this type of description for the random excitation itself.! Of these two 
approaches, statistical and probabilistic, the statistical approach is the most useful. 
In only a very limited class of problems is it possible to obtain a probabilistic 
description of the output based on a probabilistic description of the input, whereas 
in many cases of interest, a statistical model of the output can be obtained readily 
by performing simple mathematical operations on the statistical model of the in- 
put. With the statistical method, such quantities as the mean, correlation function, 
and spectral density of the output can be determined. Only the statistical approach 
is considered in the following sections. 


8—2 Analysis in the Time Domain 


By means of the convolution integral it is possible to determine the response of a 
linear system to a very general excitation. In the case of time-varying systems or 
nonstationary random excitations, or both, the details become quite involved; 
therefore, these cases will not be considered here. To make the analysis more 
realistic we will further restrict our considerations to physically realizable systems 
that are bounded-input/bounded-output stable. If the input time function is desig- 
nated as x(t), the system impulse response as A(t), and the output time function as 
y(t), as shown in Figure 8-1, then they are all related either by 


y(t) = Í x(t — A)h(A) dA (8-1) 
or by 
t 
y(t) = [. x(A)h(t — A) dA (8—2) 
The physical realizability and stability constraints on the system are given by 
A(t) = 0 t «0 (8-3) 
[ |A(t)| dt < o (8—4) 


Starting from these specifications, many important characteristics of the output of 
a system excited by a stationary random process can be determined. 

A simple example of time-domain analysis with a deterministic input signal 
serves to review the methods and provides the background for extending these 


'By a probabilistic description we mean one in which certain probability functions are specified; by a 
statistical description we mean one in which certain ensemble averages are specified (for example, 
mean, variance, correlation). 
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x(t) 





Figure 8-1 Time-domain representation of a linear system. 


methods to nondeterministic input signals. Assume that we have a linear system 
whose impulse response is 


h = 5e?"  tz0 
e [) ret) 


It is clear that this impulse response satisfies the conditions for physical realiza- 
bility and stability. Let the input be a sample function from a deterministic random 
process having sample functions of the form 


X(t) = M + 4cos Qt + 0) m 


in which M is a random variable and is an independent random variable that is 
uniformly distributed between O and 2 zr. Note that this process is stationary but 
not ergodic. Furthermore, since an explicit mathematical form for the input signal 
is known, an explicit mathematical form for the output signal can be obtained 
even though the signal comes from a random process. Hence, this situation is 
quite different from those that form the major concern of this chapter, namely, 
inputs that come from nondeterministic random processes for which no explicit 
mathematical representation is possible. 

Although either (8—1) or (8—2) may be used to determine the system output, 
the latter is used here. Thus, 


i 
Y(t) = | [M + 4 cos (2A + 0)]5e *"~” da 
which may be integrated to yield 
| 5 20 
Y(t) — 3M + 30 cos (2t + 0) + 2 sin (2t + @)] 


It is clear from this result that the output of the system is still a sample function 
from a random process and that it contains the same random variables that are 
associated with the input. Furthermore, if probability density functions for these 
random variables are specified, it is possible to determine such statistics of the 
output as the mean and variance. This possibility is illustrated by the Exercises 
that follow. 
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Exercise 8—2.1 


A linear system has an impulse response of the form 


hit) = te” t2=0 
= 0 t0 


Il 


and an input signal that is a sample function from a random process 
having sample functions of the form 


X(t) = M -0 « f « oo 


in which M is a random variable that is uniformly distributed from — 6 
to + 18. 


a) Write an expression for the output sample function. 
b) Find the mean value of the output. 
c) Find the variance of the output. 


Answers: 48/625, 6/25, M/25 


Exercise 8—2.2 


A linear system has an impulse response of the form 
h(t) = 56(t) + 3 0-r«l 
= 0 elsewhere 
The input is a random sample function of the form 
X(t) = 4 sin (27t + 8) -o « f « o 
where 0 is a random variable that is uniformly distributed from O to 27r. 
a) Write an expression for the output sample function. 
b) Find the mean value of the output. 
c) Find the variance of the output. 


Answers: 0, 200, 20 sin (27t + 6) 
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8—3 Mean and Mean-Square Value of System 
Output 


The most convenient form of the convolution integral, when the input X(f) is a 
sample function from a nondeterministic random process, is 


Y(t) = f X(t — A)A(A) dÀ (8-5) 


since the limits of integration do not depend on ¢. Using this form, consider first 
the mean value of the output y(t). This is given by 


Y = E[Y()] = E li X(t — A)A(A) aa (8—6) 


The next logical step is to interchange the sequence in which the time integration 
and the expectation are performed; that is, to move the expectation operation in- 
side the integral. Before doing this, however, it is necessary to digress a moment 
and consider the conditions under which such an interchange is justified. 

The problem of finding the expected value of an integral whose integrand con- 
tains a random variable arises many times. In almost all such cases it is desirable 
to be able to move the expectation operation inside the integral and thus simplify 
the integrand. Fortunately, this interchange is possible in almost all cases of prac- 
tical interest and, hence, is used throughout this book with little or no comment. 
It is advisable, however, to be aware of the conditions under which this is possi- 
ble, even though the reasons for these conditions are not fully understood. The 
conditions may be stated as follows: 

If Z(t) is a sample function from a random process (or some function, such as 
the square, of the sample function) and f(t) is a nonrandom time function, then 


j | Z(t)f (0) «| = [ E[Z(0)] f (t) dt 
if 
: | EZOO dt < o 


and 


2. Z(t) is bounded on the interval 5, to t,. Note that t; and t; may be infinite. 
(There is no requirement that Z(t) be from a stationary process.) 


In applying this result to the analysis of linear systems the nonrandom function 
f(t) is usually the impulse response A(t). For wide-sense stationary input random 
processes the quantity E[[Z(?)|] is a constant not dependent on time t. Hence, the 
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stability condition of (8—4) is sufficient to satisfy condition 1. The boundedness 
of Z(t) is always satisfied by physical signals, although there are some mathemat- 
ical representations that may not be bounded. 

Returning to the problem of finding the mean value of the output of a linear 
system, it follows that 


Y = Í E[X (t — A)]A(A) dA = ral h(a) da (8-7) 


when the input process is wide-sense stationary. It should be recalled from earlier 
work in systems analysis that the area of the impulse response is just the dc gain 
of the system—that is, the transfer function of the system evaluated at w = O. 
Hence, (8-7) simply states the obvious fact that the dc component of the output 
is equal to the dc component of the input times the dc gain of the system. If the 
input random process has zero mean, the output process will also have zero mean. 
If the system does not pass direct current, the output process will always have 
zero mean. 

In order to find the mean-square value of the output we must be able to calcu- 
late the mean value of the product of two integrals. However, such a product may 
always be written as an iterated double integral if the variables of integration are 
kept distinct. Thus, 


oo 


| X(t — A9h(À3) ax 


Y? = E[(YN0] = E | X(t — ADR) dA f 
=E Í dÀ, | X(t — A)X(t — ADAADA) i| (8—8) 
= | dÀ, Í E[X(t — ADX (t — AJACA DACA) daz (8-9) 


in which the subscripts on A, and A, have been introduced to keep the variables 
of integration distinct. The expected value inside the double integral is simply the 
autocorrelation function for the input random process; that is, 


E[X(t — A)X(t — A5] = Ryt — Ay — t + Az) = Re — A) 


Hence, (8—9) becomes 
y2 = Í da | Ry (Az z A DACA DALA) dÀ; (8—10) 


Although (8—10) is usually not difficult to evaluate, since both Ry(7) and h(t) 
are likely to contain only exponentials, it is frequently very tedious to carry out 
the details. This is because such autocorrelation functions often have discontin- 
uous derivatives at the origin (which in this case is A, = Az) and thus the integral 
must be broken up into several ranges. This point is illustrated later. At the mo- 
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ment, however, it is instructive to consider a much simpler situation—one in 
which the input is a sample function from white noise. For this case, it was shown 
in Section 7-7 that 


Ry(7) = Soó(7) 


where Sọ is the two-sided spectral density of the white noise. Hence (8—10) be- 
comes 


yY? = [ dÀ, ji $90(A5 — Aj)RA(A)h(A5) daz (8-11) 


Integrating over A, yields 


Y2 = So Í h^(A) da (8-12) 


Hence, for this case it is the area of the square of the impulse response that is 
significant.” 

As a means of illustrating some of these ideas with a simple example, consider 
the single-section, low-pass RC circuit shown in Figure 8-2. The mean value of 
the output is, from (8-7), 


=b. 
e À 





Y-X | be^ ^ da = Xb -X (n 





0 


This result is obviously correct, since it is apparent by inspection that the dc gain 
of this circuit is unity. 

Next consider the mean-square value of the output when the input is white 
noise. From (8-12), this is 


—2bA 
e 


— 2D 


bSo 
=— (8-14) 
0 2 








Y? = So f be~” dÀ = b’So 


Note that the parameter b, which is the reciprocal of the time constant, is also 
related to the half-power bandwidth of the system. In particular, this bandwidth B © 
is 


21t should be noted that, for some functions, this integral can diverge even when (8—4) is satisfied. 
This occurs, for instance, whenever A(t) contains 6 functions. The high-pass RC circuit is an example 
of this. 
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1 — P 


R 
H(s) * = T= 
d + EET Rag s+b 
where b = gc 
X(t) C Y(t) 
h(t)= be" t20 
= 5 = 0 t«0 


Figure 8-2 Simple RC circuit and its impulse response. 





so that (8—14) could be written as 


Y^ = m B5, (8-15) 


It is evident from the above that the mean-square value of the output of this 
system increases /inearly with the bandwidth of the system. Such results are typ- 
ical whenever the bandwidth of the input random process is large compared with 
the bandwidth of the system. 

We might consider next a situation in which the input sample function is not 
from a white noise process. In this case the complete double integral of (8—10) 
must be evaluated. However, this is likely to be a tedious operation and is, in 
fact, just a special case of the more general problem of obtaining the autocorre- 
lation function of the output. Since obtaining the complete autocorrelation func- 
tion is only slightly more involved than finding just the mean-square value of the 
output, this task is postponed until the next section. 





Exercise 8—3.1 


A linear system has an impulse response of 
h(t) = te ?' u(t) 


where u(t) is the unit step function. The input to this system is a sam- 
ple function from a white noise process having a two-sided spectral 
density of 4 V?/Hz plus a dc component of 2 V. 


a) Find the mean value of the output of the system. 
b) Find the variance of the output. 


c) Find the mean-square value of the output. 


Answers: 0.0370, 0.0864, 0.2222 
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Exercise 8—3.2 
White noise having a two-sided spectral density of 0.25 V?/Hz is ap- 
plied to the input of a finite-time integrator whose impulse response is 
A(t) = 5S[u(t) — u(t — 0.2)] 
a) Find the mean value of the output of the system. 


b) Find the mean-square value of the output. 


Answers: 0, 1.25 





8—4 Autocorrelation Function of System 
Output 


A problem closely related to that of finding the mean-square value is the deter- 
mination of the autocorrelation function at the output of the system. By definition, 
this autocorrelation function 1s 


Ry(r) = E[Y()Y(t + 7)] 
Following the same steps as in (8—9), except for replacing t by t + 7 in one 
factor, the autocorrelation function may be written as 


Ry(t) = | dA, | E[X(t — AOX(t + 7 — AJ]h(AQR(A;) dà; — (8-16) 


In this case the expected value inside the integral is 
E[X(t — A)X(t + T — A3] Ry(t — Ay — t — T + Ag) = Ry(A2 — A, — 7) 


Hence, the output autocorrelation function becomes 
Ry(t) = { da, Í Ry (Az — Ay — DAADAA )dÀ; (8-17) 


Note the similarity between this result and that for the mean-square value. In 
particular, for 7 = 0, this reduces exactly to (8—10), as it must. 

For the special case of white noise into the system, the expression for the output 
autocorrelation function becomes much simpler. Let 


Ry(7) = S oO (T) 


as before, and substitute into (8—17). Thus, 


AUTOCORRELATION FUNCTION 293 


Ry(7) [ da, | Sola — Ay — TAA DACA) daz 


(8-18) 


So | ADA: + D d 


Hence, for the white-noise case, the output autocorrelation function is proportional 
to the time correlation function of the impulse response. 

This point can be illustrated by means of the linear system of Figure 8—2 and a 
white-noise input. Thus, 


Ry(r) = So f (be^ be ^ *n dA 


—2bA 
e 


—2b 


(8-19) 
bSo — br 


= — ep ` TzO 
" 2 





= b*Soe ind 





This result is valid only for 7 = 0. When 7 < 0, the range of integration must be 
altered because the impulse response is always zero for negative values of the 
argument. The situation can be made clearer by means of the two sketches shown 
in Figure 8-3, which show the factors in the integrand of (8—18) for both ranges 
of 7. The integrand is zero, of course, when either factor is zero. When 7 < 0, 
the integral becomes 


Ry(T) = So | (be ^ "^)pe P^ *? dÀ 
oh (8-20) 


» JbA 


— 2b 





bS 
- = e" 


ll 


b Sge °* rTxO 





T0 T«O 





(a) (b) 


Figure 8-3 Factors in the integrand of (8—18) when the RC circuit of Figure 8—2 
is used. 
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From (8-19) and (8—20), the complete autocorrelation function can be written as 
bS 
Ry,(r) = = e" - wc rca (8-21) 


It is now apparent that the calculation for 7 < O was needless. Since the autocor- 
relation function is an even function of 7, the complete form could have been 
obtained immediately from the case for 7 = 0. This procedure is followed in the 
future. 

It is desirable to consider at least one example in which the input random pro- 
cess is not white. In so doing, it is possible to illustrate some of the integration 
problems that develop and, at the same time, use the results to infer something 
about the validity and usefulness of the white-noise approximation. For this pur- 
pose, assume that the input random process to the RC circuit of Figure 8—2 has 
an autocorrelation function of the form 


S. 
Ry(T) = e eg  —ocr«cw (8-22) 


The coefficient BSọ/2 has been selected so that this random process has a spectral 
density at « = 0 of So; see (7—41) and Figure 7—7(b). Thus, at low frequencies, 
the spectral density is the same as the white-noise spectrum previously assumed. 

In order to determine the appropriate ranges of integration, it is desirable to 
look at the autocorrelation function Ry(A; — À; — 7), as a function of A, for 7 
> 0. This is shown in Figure 8—4. Since A, is always positive for the evaluation 
of (8—17), it is clear that the ranges of integration should be from O to (A, + 7) 
and from (A, + 7) to ». Hence, (8—17) may be written as 


- Al+r 
Ry(7T) = Í ar, | Ry (Az — Ay — T)h(A))&(A5) daz 


+ Í dÀ, f Ra _ A, —_ T)h(À)h(A5) da, 


Rx(A5— ^1— T) 





2 


0 Ait 


Figure 8-4 Autocorrelation function to be used in (8—17). 
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2 a Apr 
PBS, | eg EtA] dÀ; | e Pra G- Az di 


E" 


7 a 
E P f d | ee POM ay 
I+7 (8-23) 


b^ BS, -8 [ —(b+ 8)Ajr,—(b—BXA] 7) 
x LOU MM r ] ' " — | 
—Xb = 8) l e [e ] dÀi 


b^BSs - | ,— (b—,8A, ,— (b+ AWA, +7) 
- RE A i “ 


< b’BSo (> & e Ar n b^BS, ea, 
2b - BN 2 bg) 2%b+ B)\ 2%» 


b^ BS, 
PSo (en es Bem) r2) 








2p) — g^ b 
From symmetry, the expression for 7 < O0 can be written directly. The final result 
is 
p^ps, : B. sa | 
Ry(t) = w-A ( An — e ui (8-24) 


In order to compare this result with the previously obtained result for white 
noise at the input, it is only necessary to let 8 approach infinity. In this case, 


bS 
E e "n (8—25) 


lim Ry(7) = 
B 


which is exactly the same as (8—21). Of greater interest, however, is the case 
when f is large compared to b but still finite. This corresponds to the physical 
situation in which the bandwidth of the input random process is large compared 
to the bandwidth of the system. In order to make this comparison, write (8—24) 
as 


bo -j — 1 ||; _ 5. -u- bin | 
Ryl) = e f - sl ae (8-26) 


The first factor in (8—26) is the autocorrelation function of the output when the 
input is white noise. The second factor is the one by which the true autocorrelation 
of the system output differs from the white-noise approximation. It is clear that as 
B becomes large compared to b, this factor approaches unity. 

The point to this discussion is that there are many practical situations in which 
the input noise has a bandwidth that is much greater than the system bandwidth, 
and in these cases it is quite reasonable to use the white-noise approximation. In 
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doing so, there is a great saving in labor without much loss in accuracy; for 
example, in a high-gain amplifier with a bandwidth of 10 MHz, the most impor- 
tant source of noise is shot noise in the first stage, which may have a bandwidth 
of 1000 MHz. Hence, the factor b/B in (8-26), assuming that this form applies, 
will be only 0.01, and the error in using the white-noise approximation will not 
exceed | percent. 





Exercise 8—4.1 


For the white noise and finite-time integrator of Exercise 8—3.2 find 
the value of the autocorrelation function of the system output at 


a) T= 
b 7=0.1 
c) 7 = 0.21 


Answers: 0, 0.625, 1.25 


Exercise 8—4.2 


A linear system has an impulse response of 

h(t) = 3e "u(t) 
The input to this system is a sample function from a random process 
having an autocorrelation function of 

Ry(r) = 2e ^^ 
Find the value of the autocorrelation function of the output of the sys- 
tem for 


a) r=0 
b 7=0.5 
e) T fi 


Answers: 0.1236, 0.4170, 0.8571 
——Tp]——À———————————————— '/——) —X"Xcd— —————————————J]———————sárá—— n 
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8—5  Crosscorrelation Between Input and 
Output 


When a sample function from a random process is applied to the input of a linear 
system, the output must be related in some way to the input. Hence, they will be 
correlated, and the nature of the crosscorrelation function is important. In fact, it 
will be shown very shortly that this relationship can be used to provide a practical 
technique for measuring the impulse response of any linear system. 

One of the crosscorrelation functions for input and output is defined by 


Ryy(7) = E[X()Y + 7)] (8-27) 


which, in integral form, becomes 


Ryy(7) = el xo | X(t + T — A)A(A) a (8-28) 


Since X(t) is not a function of À, it may be moved inside the integral and then the 
expectation may be moved inside. Thus, 


Rxy(T) = Í E[X(ÐX(t + 7 — A)]h(A) dA T€ 


= Í Ry(t — A)h(A) dA 
Hence, this crosscorrelation function is just the convolution of the input autocor- 


relation function and the impulse response of the system. 
The other crosscorrelation function is 


Ryy(r) = E[X(t + 7)Y()] = Exe + » | X(t — A)A(A) a 
= Í E[X (t + DX (t — A)]A(A) dA (8—30) 
= | Ry(t + A)A(A) da 
Since the autocorrelation function in (8—30) is symmetrical about A = -— 7 and 


the impulse response is zero for negative values of A, this crosscorrelation func- 
tion will always be different from Ryy(7). They will, however, have the same 
value at 7 = 0. 

A simple example will serve to illustrate the calculation of this type of cross- 
correlation function. If we consider the system of Figure 8—2 and an input from a 
random process having the autocorrelation function given by (8—22), the crosscor- 
relation function can be expressed as 
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Ryy (T) = [ [es £779 be) a 
+ | BS em bem dÀ (8-31) 


when 7 > 0. The integration is now straightforward and yields 
l 


a | 
X8 —5* | r>0 (832 


Rxy(T) = pos,| E gm 


For 7 < 0 the integration is even simpler. 


Rxy(7) = | 5E aed [ie dA (8-33) 
Carrying out this integration leads to 
BbSo if 
Ryy (7) X8 4 B e 7<0 (8-34) 


The other crosscorrelation function can be obtained from 
Ryy (T) = Rxy(— 7) (8-35) 


The above results become even simpler when the input to the system is consid- 
ered to be a sample function of white noise. For this case, 


Rx (tT) = So6(7) 


and 
Rxy(7) = Í Sgó(r — A)A(A) dA = Soh(7) TE 
(8—36) 
= 0 T«0 
Likewise, 
Ryx (7) = |. sôa + ADAC) dA = 0 T» 
i (8-37) 


= Soh(— T) rz 


It is the result shown in (8—36) that leads to the procedure for measuring the 
impulse response, which will be discussed next. 

Consider the block diagram shown in Figure 8-5. The input signal X(f) is a 
sample function from a random process whose bandwidth is large compared to the 
bandwidth of the system to be measured. In practice, a bandwidth ratio of 10 to 
| gives very good results. For purposes of analysis this input will be assumed to 
be white. 
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System to 
be measured 


h(t) 


X(t) Y(t) 


White noise 







Low-pass | 
filter 





><Z(t)> 






Ideal delay, 7 seconds 





Figure 8-5 Method for measuring the impulse response of a linear system. 


In addition to being applied to the system under test, this input signal is also 
delayed by 7 seconds. If the complete impulse response is desired, then 7 must be 
variable over a range from zero to a value at which the impulse response has 
become negligibly small. Several different techniques exist for obtaining such de- 
lay. An analog technique employs a magnetic recording drum on which the play- 
back head can be displaced by a variable amount around the drum from the re- 
cording head. More modern techniques, however, would sample the signals at a 
rate that is at least twice the signal bandwidth and then delay the samples in a 
charge-coupled delay line or a switched capacitor delay line. Alternatively, the 
samples might be quantized into a finite number of amplitude levels (see Sec. 
2—7) and then delayed by means of shift registers. For purposes of the present 
discussion, we simply assume the output of the delay device to be X(t — 7). 

The system output Y(r) and the delay unit output are then multiplied to form 
Z(t) = X(t — 7)Y(t), which is then passed through a low-pass filter. If the band- 
width of the lowpass filter is sufficiently small, its output will be mostly just the 
dc component of Z(t), with a small random component added to it. For an ergodic 
input process, Z(t) will be ergodic? and the dc component of Z(t) (that is, its time 
average) will be the same as its expected value. Thus, 


(Z()) = E[Z(O) = ELYOX(t — 7)) = Ryxy(7) (8-38) 
since in the stationary case 
E[Y()X(t — 7)) = E[X(Y(t + 7)) = Ryy(7) (8-39) 


But from (8—36), it is seen that 
(Z(t) = Soh) | 7z0 
ex () T «0 


“This is true for a time-invariant system and a fixed delay r. 
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Hence, the dc component at the output of the lowpass filter is proportional to the 
impulse response evaluated at the 7 determined by the delay. If 7 can be changed, 
then the complete impulse response of the system can be measured. 

At first thought, this method of measuring the impulse response may seem like 
the hard way to solve an easy problem; it should be much easier simply to apply 
an impulse (or a reasonable approximation thereto) and observe the output. How- 
ever, there are at least two reasons why this direct procedure may not be possible 
or desirable. In the first place, an impulse with sufficient area to produce an ob- 
servable output may also drive the system into a region of nonlinear operation 
well outside its intended operating range. Second, it may be desired to monitor 
the impulse response of the system continuously while it is in normal operation. 
Repeated applications of impulses may seriously affect this normal operation. In 
the crosscorrelation method, however, the random input signal can usually be 
made small enough to have a negligible effect on the operation. 

Some practical engineering situations in which this method has been success- 
fully used include automatic control systems, chemical process control, and mea- 
surement of aircraft characteristics in flight. One of the more exotic applications 
is the continuous monitoring of the impulse response of a nuclear reactor in order 
to observe how close it is to critical-that is, unstable. It is also being used to 
measure the dynamic response of large buildings to earth tremors or wind gusts. 





Exercise 8—5.1 
For the white noise input and system impulse response of Exercise 
8—4.1, evaluate both crosscorrelation functions at the same values 
of T. 


Answers: 0, O, 0, 0, 1.25, 1.25 


Exercise 8—5.2 


For the input noise and system inpulse response of Exercise 8—4.2, 
evaluate both crosscorrelation functions for the same values of 7. 


Answers: — 0.4167, —0.1731, 0.0157, 0.1160, 0.8571, 0.8571 
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8—6 Examples of Time-Domain System 
Analysis 


A simple RC circuit responding to a random input having an exponential autocor- 
relation function was analyzed in Section 8—4 and was found to involve an appre- 
ciable amount of labor. Actually, systems and inputs such as these are usually 
handled more conveniently by the frequency-domain methods that are discussed 
later in this chapter. Hence, it seems desirable to look at some situation in which 
time-domain methods are easier. These situations occur when the impulse re- 
sponse and autocorrelation function have a simple form over a finite time interval. 

The system chosen for this example is the finite-time integrator, whose impulse 
response is shown in Figure 8—6(a). The input is assumed to have an autocorre- 
lation function of the form shown in Figure 8—6(b). This autocorrelation function 
might come from the random binary process discussed in Section 6—2, for exam- 
ple. 

For the particular input specified, the output of the finite-time integrator will 
have zero mean, since X is zero. In the more general case, however, the mean 
value of the output would be, from (8—7), 


T 
m aus d d = 
y= x| qu 7X (8—40) 


0 


Since the input process is not white, (8—10) must be used to determine the 
mean-square value of the output. Thus, 


o T (7 [x^ 


As an aid in evaluating this integral, it is helpful to sketch the integrand as shown 
in Figure 8—7 and note that the mean-square value is just the volume of the region 
indicated. Since this volume is composed of the volumes of two right pyramids, 


h(t) Ry(7) 





(a) (b) 


Figure 8-6 (a) Impulse response of finite-time integrator and (b) input autocorre- 
lation function. 
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Figure 8-7  Integrand of (8—41) 


each having a base of A?/T^ by VŽ T and an altitude of 7/\/2, the total volume 


is seen to be 
F a of l(a (5) "T 
Pu 213}(4) (V2 T) Fil > 3A (842) 


It is also possible to obtain the autocorrelation function of the output by using 
(8—17). Thus, 


4 


— i i IY 
Ry(T) = f dÀ, { Ry (Az — A, = T) (2 dA, (8—43) 


It is left as an exercise for the reader to show that this has the shape shown in 
Figure 8—8 and is composed of segments of cubics. 

It may be noted that the results become even simpler when the input random 
process can be treated as if it were white noise. Thus, using the special case 
derived in (8—12), the mean-square value of the output would be 


— FAX 35 
2 — = — | 
y = s, | (4) dÀ = (8-44) 





— 2T -T 0 T T 


Figure 8-8 Autocorrelation function of the output of the finite-time integrator. 
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where Sọ is the spectral density of the input white noise. Furthermore, from the 
special case derived in (8-18), the output autocorrelation function can be sketched 
by inspection, as shown in Figure 8—9, since it is just the time correlation of the 
system impulse response with itself. Note that this result indicates another way in 
which a random process having a triangular autocorrelation function might be 
generated. 

The second example utilizes the result of (8—14) to determine the amount of 
filtering required to make a good measurement of a small dc voltage in the pres- 
ence of a large noise signal. Such a situation might arise in any system that at- 
tempts to measure the crosscorrelation between two signals that are only slightly 
correlated. Specifically, it is assumed that the signal appearing at the input to the 
RC circuit of Figure 8-2 has the form 


X(t) = A + Nb) 
where the noise N(r) has an autocorrelation function of 
Ry(T) = 10e "oi" 


It is desired to measure A with an rms error of 1 percent when A itself is on the 
order of 1 and it is necessary to determine the time-constant of the RC filter 
required to achieve this degree of accuracy. 

Although an exact solution could be obtained by using the results of the exact 
analysis that culminated in (8—24), this approach is needlessly complicated. If it 
is recognized that the variance of the noise at the output of the filter must be very 
much smaller than that at the input, then it is clear that the bandwidth of the filter 
must also be very much smaller than the bandwidth of the input noise. Under 
these conditions the white-noise assumption for the input must be a very good 
approximation. 

The first step in using this approximation is to find the spectral density of the 
noise in the vicinity of œ = 0, since only frequency components in this region 
will be passed by the RC filter. Although this spectral density can be obtained 
directly by analogy to (8—22), the more general approach is employed here. It was 
shown in (7—40) that the spectral density is related to the autocorrelation function 
by 


Ry (7) 


u 
i 


T 


-T 0 T 


Figure 8-9 Output autocorrelation function with white-noise input. 
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Sy(@) = [ Ry(r)e 7" dr 
At œ = 0, this becomes 
S50) = [ Ry(r) dr = 2 | R(T) dr (8—45) 


Hence, the spectral density of the assumed white-noise input would be this same 
value, that is, Sy = Sẹ(0). Note that (8—45) is a general result that does not 
depend upon the form of the autocorrelation function. In the particular case being 
considered here, it follows that 


T iii 20 
s, = 210) | poe dt = g^ 09 


From (8—14) it is seen that the mean-square of the filter output, N,(r), will be 


In order to achieve the desired accuracy of | percent it is necessary that 


VN,” = (0.01)(1.0) 


when A is 1.0, since the de gain of this filter is unity. Thus, 
Nj = 0.01b = 1074 
so that 
b= 10° 
Since b = 1/RC, it follows that 
RC = 10? 


in order to obtain the desired accuracy. 

It was noted that crosscorrelation of the input and output of a linear system 
yields an estimate of the system impulse response when the input has bandwidth 
that is large compared to the bandwidth of the system. Usually the crosscorrelation 
operation is carried out by sampling the input time function and the output time 
function, delaying the samples of the input, and then averaging the product of the 
delayed samples of the input and the samples of the output. A block diagram of a 
system that accomplishes this is shown in Figure 8—10. In order to analyze this 
method in more detail, let samples of the input time function, X (r), be designated 
as 


X, = X (kåt) R= lrk atee N 
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h(t) 
System being Sampler 
measured 









generator 








Sampler 
X(kAt — nAt) 


l 
A = : 
c, AUN -n+ 1) 





Figure 8-10 Block diagram of a system that will estimate the impulse response of 
a linear system. 


where At is the time interval between samples. In a similar way, let samples of 
the output time function be designated as 


Y, = Y(kAr) k-—l2,;::. M 


An estimate of the nth sample of the crosscorrelation function between the input 
and output is obtained from 


N 
Ryy(nAr) = AX Xiao B012,..:,M«N 

In order to relate this estimate to an estimate of the system impulse response, it 
is necessary to relate the variance of the samples of X(t) to the spectral density of 
the random process from which they came. If the bandwidth of the input process 
is sufficiently large that samples spaced Ar seconds apart may be assumed to be 
statistically independent, these samples can be imagined to have come from a 
bandlimited white process whose bandwidth is 1/2Ar. Since the variance of such 
a white bandlimited process is just 2S 9W, it follows that Sg = Ty M. It doesn't 
matter what the actual spectral density is. /ndependent samples from any process 
are indistinguishable from independent samples from a white bandlimited process 
having the same variance. Thus, an estimate of the system impulse response is, 
from (8—36), given by 


à i a 
h(nAt) = —À: Ryy (nåt) 

y M 
l N 
— eXlAKN — n + 1) 2. Xk- na in 


0. l2, 5. MSN 


= 
| 
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By taking the expected value of (8—46) it is straightforward to show that this is 
an unbiased estimate of the system impulse response. Furthermore, it can be 
shown that the variance of this estimate is bounded by 


: ó M 
Var [h(nAt)] = ^ X RAD (847) 
k=0 
Often it is more convenient to replace the summation in (8—47) by 
M | x 
2, < x | 2 
2 HA) = © |, PO di (8-48) 


Note that the bound on the variance of the estimate does not depend upon which 
sample of the impulse response is being estimated. 

The above results are useful in determining how many samples of the input and 
Output are necessary to achieve a given accuracy in the estimation of an impulse 
response. In order to illustrate this, assume it is desired to estimate an impulse 
response of the form 


h(t) = 5e ™ sin20t u(t) 


with an error of less than 1 percent of the maximum value of the impulse re- 
sponse. Since the maximum value of this impulse response is about 3.376 at t = 
0.0785, the variance of the estimate should be less than (0.01 x 3.376) = 
0.0011. Furthermore, 


Í h^(t) dt = 1.25 


Thus, from (8—47) and (8—48), the number of samples required to achieve the 
desired accuracy is bounded by 


2 x 1.25 
c qas anu Ru 


= — = 
0.0011 see 


The selection of At is governed by the desired number of points, M, at which 
h(t) is to be estimated and by the length of the time interval over which A(t) has 
a significant magnitude. To illustrate this, suppose that in the above example it is 
desired to estimate the impulse response at 50 points over the range in which it is 
greater than | percent of its maximum value. Since the sine function can never be 
greater than 1.0, it follows that 5e7™ = 0.01 x 3.376 implies that the greatest 
delay interval that must be considered is about | second. Thus, a value At = 
1/50 — 0.02 seconds should be adequate. The bandwidth of an ideal white ban- 
dlimited source that would provide independent samples at intervals 0.02 seconds 
apart is 25 Hz, but a more practical source should probably have half-power band- 
width of about 250 Hz to guarantee the desired independence. 
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Exercise 8—6.1 


White noise having a two-sided spectral density of 0.01 is applied to 
the input of a finite-time integrator having an impulse response of 


l 
h(t) = 3 luo - WE 3 


Find the value of the autocorrelation function of the output at 


a) T-0 
b 7-21 
Cc) r=2. 


Answers: 0.1111, 0.2222, 0.3333 


Exercise 8—6.2 


A dc signal plus noise has sample functions of the form 
X(t) = A + N(I) 
where N(t) has an autocorrelation function of the form 


ir 


Ku) 9 15 9i 


Ir| = 0.01 


A finite-time integrator is used to estimate the value of A with an rms 
error or less than 0.02. If the impulse response of this integrator is 


l 
h(t) — plu) - WE FT) 
find the value of 7T required to accomplish this. 


Answer: 25 





8—7 Analysis in the Frequency Domain 


The most common method of representing linear systems in the frequency domain 
is in terms of the system function H(«) or the transfer function H(s), which are 
the Fourier and Laplace transforms, respectively, of the system impulse response. 
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If the input to a system is x(t) and the output y(r), then the Fourier transforms of 
these quantities are related by 


Y(o) = X(w)H(w) (8-49) 
and the Laplace transforms are related by 
Y(s) — X(s)H(s) (8-50) 


provided the transforms exist. Neither of these forms is suitable when X(f) is a 
sample function from a stationary random process. As discussed in Section 7-1, 
the Fourier transform of a sample function from a stationary random process gen- 
erally never exists. In the case of the one-sided Laplace transform the input-output 
relationship is defined only for time functions existing for t > 0, and such time 
functions can never be sample functions from a stationary random process. 

One approach to this problem is to make use of the spectral density of the 
process and to carry out the analysis using a truncated sample function in which 
the limit T — © is not taken until after the averaging operations are carried out. 
This procedure is valid and leads to correct results. There is, however, a much 
simpler procedure that can be used. In Section 7—6 it was shown that the spectral 
density of a stationary random process is the Fourier transform of the autocorre- 
lation function of the process. Therefore, using the results we have already ob- 
tained for the correlation function of the output of a linear time-invariant system, 
we can obtain the corresponding results for the spectral density by carrying out 
the required transformations. When the basic relationship has been obtained, it 
will be seen that there is a close analogy between computations involving nonran- 
dom signals and those involving random signals. 


8—8 Spectral Density at the System Output 


The spectral density of a process is a measure of how the average power of the 
process is distributed with respect to frequency. No information regarding the 
phases of the various frequency components is contained in the spectral density. 
The relationship between the spectral density Sy(@) and the autocorrelation func- 
tion Ry(r) for a stationary process was shown to be 


Sx(@) = F{Rx (7)} (8-51) 


Using this relationship and (8—17), which relates the output autocorrelation func- 
tion Ry(7) to the input correlation function Ry (7) by means of the system impulse 
response, we have 


KRy(T) = f da, | Ry (A> = Ay = T)h(À,)h(A5) da, 
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Sy(@) = FiRy(7)} 


| | | an | Rx (Az = Ay — TAAAC) ax e 7" dr 


Interchanging the order of integration and carrying out the indicated operations 
gives 


Sy(@) = l da, | ACA DACA) dÀs [Rea oe Ay z Tje ^" dr 


f dA, f h(A)Rh(3)Sy (o)e 7? ^^" dA 


5.6) | h(a! ar, | h(A;)e 2"? daz (8-52) 


Sy(@) H(—w)H(@) 
Sx ()|H(@)|? 


ll 


II 


In arriving at (8—52) use was made of the property that Ry(—7) = Ry(7). 

From (8—52) it is seen that the output spectral density is related to the input 
spectral density by the power transfer function, Hlo). This result can also be 
expressed in terms of the complex frequency s as 


Sy(s) = Sx (s)H(s)H(— s) (8-53) 


where Sy(s) and Sy(s) are obtained from Sy(@) and Sy(@) by substituting -$ = 
w°, and where H(s) is obtained from H(w) by substituting s = jw. It is this form 
that will be used in further discussions of frequency analysis methods. 

It is clear from (8-53) that the quantity H(s)H(—s) plays the same role in 
relating input and output spectral densities as H(s) does in relating input and out- 
put transforms. This similarity makes the use of frequency domain techniques for 
systems with rational transfer funtions very convenient when the input is a sample 
function from a stationary random process. However, this same technique is not 
always applicable when the input process is nonstationary, even though the defi- 
nition for the spectral density of such processes is the same as we have employed. 
A detailed study of this matter is beyond the scope of the present discussion but 
the reader would do well to question any application of (8—53) for nonstationary 
processes. 

Since the spectral density of the system has now been obtained, it is a simple 
matter to determine the mean-square value of the output. This is simply 

= LUE, T^ 
Y4 = — | | H(s)H(— s)8y (s) ds (8—54) 
2mj J-j* 


and may be evaluated by either of the methods discussed in Section 7—5. 
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In order to illustrate some of the methods, consider the RC circuit shown in 
Figure 8-11 and assume that its input is a sample function from a white-noise 
process having a spectral density of So. The spectral density at the output is simply 

b b —b°So 


Se Seek FP em) 





The mean-square value of the output can be obtained by using the integral /,, 
tabulated in Table 7—1, Section 7—5. In order to do this, it is convenient to write 
(8—55) as: 
$,G) (bV Sob V So) 

(‘s) -——————— 
J (s + b)-—s + b) 


from which it is clear that n = 1, and 


c(s) = bVSo = cg 


dis) =s+b 
Thus 
dg = b 
d, = 1 
and 


Fay, = 0 — PS _ bso 
'  2dod, 2b 2 
As a slightly more complicated example, let the input spectral density be 
— B^So 
"n ic: pg 
This spectral density, which corresponds to the autocorrelation function used in 


Section 8—4, has been selected so that its value at zero frequency is Sp. The 
spectral density at the output of the RC circuit is now 





(8—56) 





Sx(s) — (8-57) 


R 
+ | + 
X(t) C T Y(t)  H(s- a 


Figure 8—11 A simple RC circuit. 
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gs +b -s +b E: (8-58) 
H b^B^Ss 
($6 - bs - B) 
The mean-square value for this output will be evaluated by using the integral /, 
tabulated in Table 7-1. Thus, 


c(s)(—s) _ (bBVSo(bBV So) 


WW Fao le Pe 4 oe leo + eee oO ™ 
it is clear that n = 2, and 
Co = bBVSo 
c = 0 
dy = bB 
d, = þ + B 
da = l 
Hence, 
= cod, + ci’do b’ B So bBSo 
Y = h = ——— = = MM (8—60) 
2dodıdh 2bB(b + B) Xb + p) 
since c, = 0. 


It is also of interest to look once again at the results when the input random 
process has a bandwidth much greater than the system bandwidth; that is, when 
B > b. From (8-58) it is clear that 


—b!$, 


(s? e b^ S s^/B^) (8—61) 


Sy(s) — 
and as 8 becomes large this spectral density approaches that for the white-input- 
noise case given by (8—55). In the case of the mean-square value, (8—60) may be 
written as 


-— bSo 


r= X1 + Bip) (8-62) 


which approaches the white-noise result of (8-56) when f is large. 

Comparison of the foregoing examples with similar ones employing time-do- 
main methods should make it evident that when the input spectral density and the 
system transfer function are rational, frequency domain methods are usually sim- 
pler. In fact, the more complicated the system, the greater the advantage of such 
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methods. When either the input spectral density or the system transfer function is 
not rational, this conclusion may not hold. 





Exercise 8—8.1 


White noise having a two-sided spectral density of 2 V^/Hz is applied 
to the input of a linear system having an impulse response of 


h(t) = te ?' u(t) 
a) Find the value of the output spectral density at œ = O. 
b) Find the value of the output spectral density at w = 3. 


c) Find the mean-square value of the output. 


Answers: 0.00617, 0.0185, 0.0247 


Exercise 8—8.2 


Find the mean-square value of the output of the system in Exercise 
8—8.1 if the input has a spectral density of 


1800 


dci 


Answer: 0.0185 





8—9 Cross-Spectral Densities Between Input 
and Output 

The cross-spectral densities between a system input and output are not widely 
used, but it is well to be aware of their existence. The derivation of these quanti- 


ties would follow the same general pattern as shown above, but only the end 
results are quoted here. Specifically, they are 


Syy(s) = H(s)Sy(s) (8—63) 
and 


Syy (5) — Hí — F Sy (5) (8—64) 
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The cross-spectral densities are related to the crosscorrelation functions between 
input and output in exactly the same way as ordinary spectral densities and auto- 
correlation functions are related. Thus, 


Sxy(s) = [. Rxy(r)e " dr 


and 


Syy (s) | " Ry (re ** dt 
Likewise, the inverse two-sided Laplace transform can be used to find the cross- 
correlation functions from the cross-spectral densities, but these relations will not 
be repeated here. As noted in Section 7—8, it is not necessary that cross-spectral 
densities be even functions of œw, or that they be real and positive. 

To illustrate the above results, we consider again the circuit of Figure 8-11 
with an input of white noise having a two-sided spectral density of Sp. From 
(8—63) and (8—64) the two cross-spectral densities are 


u bSo 
Sxy (5) "m 
and 
|... o 
Syy (s) = — EB 


If these are expressed as functions of w by letting s = jw, it is obvious that the 
cross-spectral densities are not real, even, positive functions of w. Clearly, similar 
results can be obtained for any other input spectral density. 





Exercise 8—9.1 
White noise having a two-sided spectral density of 0.5 V?/Hz is applied 
to the input of a finite-time integrator whose impulse response is 
h(t) = [u(t) — u(t — 1)] 
Find the values of both cross-spectral densities at 


a) w=0 
b) œw = 0.5 
C) w= 1. 


Answers: 0.5, 0.4794 +/j0.1224, 0.4207 +/0.2298 
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Exercise 8—9.2 


X(t) and Y(t) are from independent random processes having identical 
spectral densities of 





Sy(s) = Sy(s) = 


s$ — | 

a) Find both cross-spectral densities, Syy(s) and Sy, (s). 

b) Find both cross-spectral densities, Syy(s) and Syy(s) where 
U(t) = X(t) + Y(t) and V(t) = X(t) — Y(t). 


Answers: 0, 0, 0,0 





8—10 Examples of Frequency-Domain Analysis 


Frequency-domain methods tend to be most useful when dealing with conven- 
tional filters and random processes that have rational spectral densities. However, 
it is often possible to make the calculations even simpler, without introducing 
much error, by idealizing the filter characteristics and assuming the input pro- 
cesses to be white. An important concept in doing this is that of equivalent-noise 
bandwidth. 

The equivalent-noise bandwidth, B, of a system is defined to be the bandwidth 
of an ideal filter that has the same maximum gain and the same mean-square value 
at its output as the actual system when the input is white noise. This concept is 
illustrated in Figure 8—12 for both lowpass and bandpass systems. It is clear that 
the rectangular power transfer function of the ideal filter must have the same area 









H(w)|* — = I — E H 


| |H)! 


— 27B 0 247B 


(a) (b) 


Figure 8-12  Equivalent-noise bandwidth of systems: (a) lowpass system and (b) 
bandpass system. 
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as the power transfer function of the actual system if they are to produce the same 
mean-square outputs with the same white-noise input. Thus, in the low pass case, 
the equivalent-noise bandwidth is given by 


l "n 
B = ———, | H(w)|? dw = 

4m|H(0)|" J-~ pen 
If the input to the system is white noise with a spectral density of So, the mean- 
square value of the output is given by 


Y? = 2s9B|H(0)f (8-66) 


In the band pass case, |H(0)|? is replaced by |H(wo)|* in both (8—65) and (8—66). 

As a simple illustration of the calculation of equivalent noise bandwidth, con- 
sider the RC circuit of Figure 8—11. Since the integral of (8—65) has already been 
evaluated in obtaining the mean-square value of (8—56), it is easiest to use this 
result and (8—66). Thus, 


jj 


] 
Anl HQ? | ia H(s)H(—s)ds Hz (865) 


Y = T = 2Ss9B|H(0)|" 


Since |H(O)^ = 1, it follows that 


n EA 
4 4RC dk 
It is of interest to compare the equivalent-noise bandwidth of (8—67) with the 
more familiar half-power bandwidth. For a lowpass system, such as this RC cir- 
cuit, the half-power bandwidth is defined to be that frequency at which the mag- 
nitude of the transfer function drops to 1/\/2 of its value at zero frequency. For 
this RC filter the half-power bandwidth is simply 


l 
Ava = ARC 

Hence, the equivalent-noise bandwidth is just 7/2 times the half-power bandwidth 
for this particular circuit. If the transfer function of a system has steeper sides, 
then the equivalent-noise bandwidth and the half-power bandwidth are more 
nearly equal. i 

It is also possible to express the equivalent-noise bandwidth in terms of the 
system impulse response rather than the transfer function. Note first that 


H(0) — | h(t) dt 


Then apply Parseval's theorem to the integral in (8—65). 


| h(t dt = ES | Hlo) do 
-e 2m J-> 
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Using these relations, the equivalent-noise bandwidth becomes 


| h^(t) dt 
p oes a (8—68) 


o| f A(t) J | 
0 


The time-domain representation of equivalent-noise bandwidth may be simpler 
to use than the frequency-domain representation for systems having nonrational 
transfer functions. To illustrate this, consider the finite-time integrator defined as 
usual by the impulse response 


h(t) — "luti — WEE = ZI 
Thus, 
| h(t)dt — 
0 
and 
ý | l 
ugs x dee t 
f h'(t)dt que T 


Hence, the equivalent-noise bandwidth is 


| VT. 1 
XD? 9T 





It is also of interest to relate this equivalent-noise bandwidth to the half-power 
bandwidth of the finite-time integrator. From the Fourier transform of h(t) the 
transfer function becomes 


and this has a half-power point at Ba = 0.221/T. 
Thus, 


B= 2.26B,5 


One advantage of using equivalent-noise bandwidth is that it becomes possible 
to describe the noise response of even very complicated systems with only two 
numbers, B and |H(ao)|. 

Furthermore, these numbers can be measured quite readily in an experimental 
system. For example, suppose that a receiver in a communication system is mea- 
sured and found to have a voltage gain of 10° at the frequency to which it is tuned 
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and an equivalent-noise bandwidth of 10 KHz. The noise at the input to this re- 
ceiver, from shot noise and thermal agitation, has a bandwidth of several hundred 
megahertz and, hence, can be assumed to be white over the bandwidth of the 
receiver. Suppose this noise has a spectral density of 2 X 107% v?/Hz. (This is 
a realistic value for the input circuit of a high-quality receiver.) What should the 
effective value of the input signal be in order to achieve an output signal-to-noise 
power ratio of 100? The answer to this question would be very difficult to find if 
every stage of the receiver had to be analyzed exactly. It is very easy, however, 
using the equivalent-noise bandwidth since 


Hoo x! — X 


EN - 
2N9B|H(wo)|  2NoB (8-69) 


(5/N)g = 


if X? is the mean-square value of the input signal and No is the spectral density of 
the input noise. Thus, 





x? 
n 100 
and 
X? = 2NyB(100) = 2(2 x 10-?9(10*(100) 
eed x 10^" 
from which 


Vy - 2x 1077 V 


is the effective signal voltage being sought. Note that the actual value of receiver 
gain, although specified, was not needed to find the output signal-to-noise ratio. 

It should be emphasized that the equivalent-noise bandwidth is useful only 
when one is justified in assuming that the spectral density of the input random 
process is white. If the input spectral density changes appreciably over the range 
of frequencies passed by the system, then significant errors can result from em- 
ploying the concept. 

The final example of frequency-domain analysis will consider the feedback sys- 
tem shown in Figure 8—13. This system might be a control system for positioning 
a radar antenna, in which x(f) is the input control angle (assumed to be random 
since target position is unknown in advance) and y(t) is the angular position of the 
antenna in response to this voltage. The disturbance z(r) might represent the ef- 
fects of wind blowing on the antenna, thus producing random perturbations on the 
angular position. The transfer function of the amplifier and motor within the feed- 
back loop is 
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n(t) 







À y(t) + m(t) 


s(s+1) 





Figure 8-13 An automatic control system. 


The transfer function relating X(s) = Z[x(r)] and Y(s) = Z[y(t)] can be obtained 
by letting n(t) = O and noting that 


Y(s) = H(s)[X(s) — Y(s)] 


since the input to the amplifier is the difference between the input control signal 
and the system output. Hence, 
Hel a. 
X(s) 1 + H(s) (8-70) 
A 
S+st+A 


If the spectral density of the input control signal (now considered to be a sample 
function from a random process) is 





Sx(s) = S au 


then the spectral density of the output is 
Sy(s) = Sy HAS)H— s) (8—71) 
—24? 
(? — DG? + s + As? — s + A) 


The mean-square value of the output is given by 


y 2 [ NENNEN 
Imj J-j [S + 2? + (A + Ds + AJl- s? + 257 — (A + Ds + AJ 
= 2A‘l, 
in which 
ĉ& = l, c& = 0, cz = 0 
dg=A, dq =At1, d;-2, d,-1 


From Table 7—1, this becomes 
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— 2A 
y = E 
A T2 
The transfer function relating N(s) = Z[n(r)) and M(s) = [mð] is not the 
same as (8—70) because the disturbance enters the system at a different point. It 
is apparent, however, that 





(8-72) 


M(s) = N(s) — H(sM(s) 
from which 


"c Oo 
His) 7 N(s) l + Hí(s) (8-73) 
s(s + 1) 


SP +stA 


Let the interfering noise have a spectral density of 
S =i) — 5—7 
n(S) (s) 2 — 0.25 


This corresponds to an input disturbance that has an average value as well as a 
random variation. The spectral density of the output disturbance becomes 


Sys) Sy (S)H,(s)H,(— 5) (8—74) 
| 3152 — um 
E E | | ss? — 1) 


s — 0.25 (s? +s + AY? — s + A) 


The mean-square value of the output disturbance comes from 


S-l[j itl. s5€—5 Á—0, 
me = 2mj J-j = |o s? — male +s +A — 5 + 5 a 


Since the integrand vanishes at s = 0, the integral over 6(s) does not contribute 
anything to the mean-square value. The remaining terms are 


pe i | 8G + 1X-sY(-5s + 1) 
2mj J-i» [s + 1.59? + (A + 0.5)s + 0.5A] x 
[75 + 1.55 — (A + 0.5)s + 0.5A] 


ds 


=], 
The constants required for Table 7—1 are 
Co = 0 c = 1 Co = 1 


Il 


and the mean-square value becomes 
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— dent: EO 


~ 2A + L5 a 


The amplifier gain A has not been specified in the example in order that the 
effects of changing this gain can be made evident. It is clear from (8-72) and (8—- 
75) that the desired signal mean-square value increases with larger values of A 
while the undesired noise mean-square value decreases. Thus, one would expect 
that large values of A would be desirable if output signal-to-noise ratio is the 
important criterion. In actuality, the dynamic response of the system to rapid input 
changes may be more important and this would limit the value of A that can be 
used. 





Exercise 8—10.1 


Find the equivalent-noise bandwidth of the transfer function 


T 
Answer: Pid /2 


Exercise 8—10.2 


Find the equivalent-noise bandwidth of the system whose impulse re- 
sponse is 


h(t) = (1 — Olu) — u(t — 1)] 


Answer: 2/3 





PROBLEMS 


8—2.1 A deterministic random signal has sample functions of 


X(t) = M + B cos (20t + 0) 


8-2.2 


8-3.1 


8-3.2 


8-3.3 
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in which M is a random variable having a Gaussian probability density 
function with a mean of 5 and a variance of 64, B is a random variable 
having a Rayleigh probability density function with a mean-square value 
of 32, and @ is a random variable that is uniformly distributed from 0 to 
27. All three random variables are mutually independent. This sample 
function is the input to a system having an impulse response of 


h(t) = 10e~' u(t) 


a) Write an expression for the output of the system. 
b) Find the mean value of the output. 


c) Find the mean-square value of the output. 


Repeat Problem 8—2.1 if the system impulse response is 
h(t) = &(t) — 10e '?' u(t) 


A finite-time integrator has an impulse response of 
h(t) = 1 OS t205 
= 0 elsewhere 
The input to this system is white noise with a two-sided spectral density 
of 10V7/Hz. 
a) Find the mean value of the output. 
b) Find the mean-square value of the output. 


c) Find the variance of the output. 


Repeat Problem 8—3.1 if the input to the finite-time integrator is a sam- 
ple function from a stationary random process having an autocorrelation 
function of 


Rx(7) = 16e?" 


A sample function from a random process having an autocorrelation 
function of 


Rx(7) = 16e?" + 16 
is the input to a linear system having an impulse response of 


h(t) = &(t) — 2e " u(t) 
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a) Find the mean value of the output. 
b) Find the mean-square value of the output. 


c) Find the variance of the output. 


8-3.4 2 KO 





The above circuit models a single-stage transistor amplifier including an 
internal noise source for the transistor. The input signal is 


v(t) = 0.1 cos 20007t 


and i,(f) is a white-noise current source having a spectral density of 
2 x 10^ !^ A?/Hz that models the internal transistor noise. Find the ratio 
of the mean-square output signal to the mean-square output noise. 


8—4.1 0.1 H 







input Output 
X(t) ¥(t) 


White noise having a spectral density of 10 ^ V?/Hz is applied to the 
input of the above circuit. 
a) Find the autocorrelation function of the output. 


b) Find the mean-square value of the output. 


8—4.2 Repeat Problem 8—4.1 if the input to the system is a sample function 
from a stationary random process having an autocorrelation function of 


Ry(r) = 2e 500 


8—4.3 The objective of this problem is to demonstrate a general result that 
frequently arises in connection with many systems analysis problems. 
Consider the arbitrary triangular waveform shown on page 323. 
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Show that 


[ g'(t) dt = (ijo — q) 


for any triangle in which a = c - b. 


8—4.4 Consider a linear system having an impulse response of 
h(t) = [1 — glu — ut — 1)] 


The input to this system is a sample function from a random process 
having an autocorrelation function of 


Rx(T) = 26(7) + 9 


a) Find the mean value of the output. 
b) Find the mean-square value of the output. 


c) Find the autocorrelation function of the output. 


8—5.1 For the system and input of Problem 8-3.1 find both crosscorrelation 
functions between input and output. 


8—5.2 For the system and input of Problem 8-3.2 find both crosscorrelation 
functions between input and output. 


8—5.3 For the system and input of Problem 8—4.4 find both crosscorrelation 
functions between input and output. 


8—5.4 > + 
1kQ Y(t) 


X(t) 
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8—6.1 


8—6.2 


8—6.3 


The input X(t) to the above circuit is white noise having a spectral den- 
sity of 0.1 V7/Hz. Find the crosscorrelation function between the two 
outputs, Ryz(7), for all 7. 

A finite-time integrator has an impulse response of 


h(r) — = fu) — ult — T)] 


The input to this integrator is a sample function from a stationary ran- 
dom process having an autocorrelation function of 


Rx(7) = ni — K lr] =T 
= 0 elsewhere 


a) Find the mean-square value of the output. 


b) Find the autocorrelation function of the output. 


White 1 |^.) 
] 7, 
— i 


0 T, 









White noise having a spectral density of 0.001 V*/Hz is applied to the 
input of two finite-time integrators connected in cascade as shown in the 
figure above. Find the variance of the output if 


a Ti = T} = 0.1 
= ().1 and T; 
= 0.1 and T; 


c 

— 

d 
| 


0.01 
1.0 


| 


© 

a" 

ws 
| 


It is desired to estimate the mean value of a stationary random process 
by averaging N samples from the process. That is, let 


EI l z 
X= v 2; Xs 


Derive a general result for the variance of this estimate if: 


a) The samples are uncorrelated from one another. 


b) The samples are separated by Ar and are from a random process 
having an autocorrelation function of Ry (7). 
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8-6.4 It is desired to estimate the impulse response of a system by sampling 
the input and output of the system and crosscorrelating the samples. The 
input samples are independent and have a variance of 2.0. The system 
impulse response is of the form 


h(t) = 10te~*™' u(t) 
and 60 samples of A(t) are to be estimated in the range in which the 
impulse response is greater than 2 percent of its maximum value. 
a) Find the time separation between samples. 


b) Find the number of samples required to estimate the impulse re- 
sponse with an rms error less than | percent of the maximum value 
of the impulse response. 


c) Find the total amount of time required to make the measurements. 


8—7.1 2k 





1 kt 
Input Output 


T 1000 uF 


a) Determine the transfer function, H(s), for the system shown above. 


b) Ifthe input to the system has a Laplace transform of 





5 
X(s) = 
(s) s+4 
find |Y(s)P where Y(s) is the Laplace transform of the output of the 


system. 


8-7.2 A three-pole Butterworth filter has poles as shown in the sketch below. 
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8—8.1 


8—8.2 


8-8.3 


8—8.4 
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The filter gain at o — O is unity. 


a) Write the transfer function H(w) for this filter. 
b) Write the power transfer function |H(w)|* for this filter. 
c) Find |H(s)| for this filter. 


Find the spectral density of the output of the system of Problem 8-7.1 
if the input is: 


a) A sample function of white noise having a spectral density of 0.5 


V?/Hz. 
b) A sample function from a random process having a spectral density 
of 
m 


EO) EET 


The input to the Butterworth filter of Problem 8—7.2 is a sample function 
from a random process having an autocorrelation function of 


Ry (7) = 10e - i 


a) Find the spectral density of the output as a function of w. 


b) Find the value of the spectral density at œ = 0. 


A linear system has a transfer function of 


5 
Be s? + 15s + 50 


White noise having a mean-square value of 1.2 V*/Hz is applied to the 
input. 
a) Write the spectral density of the output. 


b) Find the mean-square value of the output. 


White noise having a spectral density of 0.8V^/Hz is applied to the input 
of the Butterworth filter of Problem 8—7.2. Find the mean-square value 
of the output. 


8-9.] 


8-9.2 


8-9.3 


8—10.1 


8—10.2 
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For the system and input of Problem 8—8.2 find both cross-spectral dens- 
ities for the input and output. 







White 
noise 
So pea 






l 


Derive general expressions for the cross-spectral densities Syz(s) and 
Szy(s) for the system shown above. 


In the system of Problem 8—9.2 let 








l 
Hy E 
and 
H,(s) = — 
a g+] 


Evaluate both cross-spectral densities between Y(t) and Z(t). 


a) Find the equivalent-noise bandwidth of the three-pole Butterworth 
filter of Problem 8-7.2. 

b) Find the half-power power bandwidth of the Butterworth filter and 
compare it to the equivalent-noise bandwidth. 


h(t) 
2 


a) Find the equivalent-noise bandwidth of the system whose impulse 
response is shown above. 
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8-10.3 


8—10.4 


8—10.5 


b) If the input to this system is white noise having a spectral density 
of 2V?/Hz, find the mean-square value of the output. 


c) Repeat part (b) using the integral of ht). 


Y(t) 





a) Find the closed-loop transfer function of the control system shown 
above. 


b) If the input to this control system system is a sample function from 
a stationary process having a spectral density of 


10 
s+2 





Sx(s) = 


find the mean-square value of the output. 


c) For the input of part (b), find the mean-square value of the error 
X(t) — Y(t). 


A tuned amplifier has a gain of 40 dB at a frequency of 10.7 MHz and 
a half-power bandwidth of 1 MHz. The response curve has a shape 
equivalent to that of a single-stage parallel RLC circuit. It is found that 
thermal noise at the input to the amplifier produces an rms value at the 
output of 0.1 V. Find the spectral density of the input thermal noise. 


It has been proposed to measure the range to a reflecting object by trans- 
mitting a bandlimited white-noise signal at a carrier frequency of fọ and 
then adding the received signal to the transmitted signal and measuring 
the spectral density of the sum. Periodicities in the amplitude of the 
spectral density are related to the range. Using the system model shown 
and assuming that a? is negligible compared to æ, investigate the possi- 
bility of this approach. What effect that would adversely affect the mea- 
surement has been omitted in the system model? 


8—10.6 


8—10.7 


8—10.8 
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| Delay 7 
nn 










[Spectrum SUTpus 


analyzer 


It is frequently useful to approximate the shape of a filter by a Gaussian 
function of frequency. Determine the standard deviation of a Gaussian 
shaped lowpass filter that has a maximum gain of unity and a halfpower 
bandwidth of W Hz. Find the equivalent-noise bandwidth of a Gaussian 
shaped filter in terms of its half-power bandwidth and in terms of its 
standard deviation. 


The thermal agitation noise generated by a resistance can be closely ap- 
proximated as white noise having a spectral density of 2KTR V^/Hz, 
where k = 1.37 x 10^? W-s/°K is the Boltzmann constant, T is the 
absolute temperature in degrees Kelvin, and R is the resistance in ohms. 
Any physical resistance in an amplifier is paralleled by a capacitance, so 
that the equivalent circuit is as shown. 









4 
L. Amplifier 
C input noise 


Rg 


Thermal 
noise | 
voltage 


a. Calculate the mean-square value of the amplifier input noise and 


show that it is independent of R. 


b. Explain this result on a physical basis. 


c. Show that the maximum noise power available (that is, with a 


matched load) from a resistance is KTB watts, where B is the equiv- 
alent-noise bandwidth over which the power is measured. 


i i 


Any signal at the input of an amplifier is always accompanied by noise. 
The minimum noise theoretically possible is the thermal noise present in 
the resistive component of the input impedance as described in Problem 
8—10.7. In general the amplifier will add additional noise in the process 
of amplifying the signal. The amount of noise is measured in terms of 
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the deterioration of the signal-to-noise ratio of the signal when it is 
passed through the amplifier. A common method of specifying this char- 
acteristic of an amplifier is in terms of a noise figure F, defined as 

.. input signal-to-noise power ratio 


i output signal-to-noise power ratio 

a. Using the above definition, show that the overall noise figure for two cas- 
caded amplifiers is F = F, + (Fa — 1)/G, where the individual amplifiers 
have power gains of G, and G, and noise figures of F, and F, respectively. 

b. A particular wide-band video amplifier has a single time constant roll-off 
with a half-power bandwidth of 100 MHz, a gain of 100 dB, à noise figure 
of 13 dB, and input and output impedances of 300 £2. Find the rms output 
noise voltage when the input signal is zero. 

c. Find the amplitude of the input sine wave required to give an output signal- 
to-noise power ratio of 10 dB. 
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CHAPTER io. 





Optimum Linear 
Systems 





9—1 Introduction 


It was pointed out previously that almost any practical system has some sort of 
random disturbance introduced into it in addition to the desired signal. The pres- 
ence of this random disturbance means that the system output is never quite what 
it should be, and may deviate very considerably from its desired value. When this 
occurs, it is natural to ask if the system can be modified in any way to reduce the 
effects of the disturbances. Usually it turns out that it is possible to select a system 
impulse response or transfer function that minimizes some attribute of the output 
disturbance. Such a system is said to be optimum. 

The study of optimum systems for various types of desired signals and various 
types of noise disturbance is very involved because of the many different situ- 
ations that can be specified. The literature on the subject is quite extensive and 
the methods used to determine the optimum system are quite general, quite pow- 
erful, and quite beyond the scope of the present discussion. Nevertheless, it is 
desirable to introduce some of the terminology and a few of the basic concepts in 
order that the student be aware of some of the possibilities and be in a better 
position to read the literature. 

One of the first steps in the study of optimum systems is a precise definition of 
what constitutes optimality. Since many different criteria of optimality exist, it is 
necessary to use some care in selecting an appropriate one. This problem is dis- 
cussed in the following section. 

After a criterion has been selected, the next step is to specify the nature of the 
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system to be considered. Again there are many possibilities, and the ease of car- 
rying out the optimization may be critically dependent upon the choice. Section 
9—3 considers this problem briefly. 

Once the optimum system has been determined, there remains the problem of 
evaluating its performance. In some cases this is relatively easy, while in other 
cases it may be more difficult than actually determining the optimum system. No 
general treatment of the evaluation problem will be given here; each case consid- 
ered is handled separately. 

In an actual engineering problem, the final step is to decide if the optimum 
system can be built economically, or whether it will be necessary to approximate 
it. If it turns out, as it often does, that it is not possible to build the true optimum 
system, then it is reasonable to question the value of the optimization techniques. 
Strangely enough, however, it is frequently useful and desirable to carry out the 
optimizing exercise even though there is no intention of attempting to construct 
the optimum system. The reason is that the optimum performance provides a 
yardstick against which the performance of any actual system can be compared. 
Since the optimum performance cannot be exceeded, this comparison clearly in- 
dicates whether any given system needs to be improved or whether its perfor- 
mance is already so close to the optimum that further effort on improving it would 
be uneconomical. In fact, it is probably this type of comparison that provides the 
greatest motivation for studying optimum systems since it is only rarely that the 
true optimum system can actually be constructed. 


9—2 Criteria of Optimality 


Since there are many different criteria of optimality that might be selected, it is 
necessary to establish some guidelines as to what constitutes a reasonable crite- 
rion. In the first place, it is necessary that the criterion satisfy certain require- 
ments, such as: 


|l. The criterion must have physical significance and not lead to a trivial re- 
sult. For example, if the criterion were that of minimizing the output noise 
power, the obvious result would be a system having zero output for both 
signal and noise. This is clearly a trivial result. On the other hand, a cri- 
terion of minimizing the output noise power subject to the constraint of 
maintaining a given output signal power might be quite reasonable. 

2. Thecriterion must lead to a unique solution. For example, the criterion that 
the average error of the output signal be zero can be satisfied by many 
systems, not all equally good in regard to the variance of the error. 

3. The criterion should result in a mathematical form that is capable of being 
solved. This requirement turns out to be a very stringent one and is the 
primary reason why so few criteria have found practical application. As a 
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consequence, the criterion is often selected primarily on this basis even 
though some other criterion might be more desirable in a given situation. 


The choice of a criterion is often influenced by the nature of the input signal— 
that is, whether it is deterministic or random. The reason for this is that the pur- 
pose of the system is usually different for these two types of signals. For example, 
if the input signal is deterministic, then its form is known and the purpose in 
observing it is to determine such things as whether it is present or not, the time 
at which it occurs, how big it is, and so on. On the other hand, when the signal 
is random its form is unknown and the purpose of the system is usually to deter- 
mine its form as nearly as possible. In either of these cases there are a number of 
criteria that might make sense. However, only one criterion for each case is dis- 
cussed here, and the one selected is the one that is most common and most easily 
handled mathematically. 

In the case of deterministic signals, the criterion of optimality used here is to 
maximize the output signal-to-noise power ratio at some specified time. This cri- 
terion is particularly useful when the purpose of the system is to detect the pres- 
ence of a signal of known shape or to measure the time at which such a signal 
occurs. There is some flexibility in this criterion with respect to choosing the time 
at which the signal-to-noise ratio is to be maximized, but reasonable choices are 
usually apparent from the nature of the signal. 

In the case of random signals, the criterion of optimality used here is to mini- 
mize the mean-square value of the difference between the actual system output 
and the actual value of the signal being observed. This criterion is particularly 
useful when the purpose of the system is to observe an unknown signal for pur- 
poses of measurement or control. The difference between the output of the system 
and the true value of the signal consists of two components. One component is 
the signal error and represents the difference between the input and output when 
there is no input noise. The second component is the output noise, which also 
represents an error in the output. The total error is the sum of these components, 
and the quantity to be minimized is the mean-square value of this total error. 

Several examples serve to clarify the criteria discussed above and to illustrate 
situations in which they might arise. Maximum output signal-to-noise ratio is a 
very commonly used criterion for radar systems. Radar systems operate by peri- 
odically transmitting very short bursts of radio-frequency energy. The received 
signal is simply one or more replicas of the transmitted signal that are created by 
being reflected from any objects that the transmitted signal illuminates. Thus, the 
form of the received signals is known exactly. The things that are not known 
about received signals are the number of reflections, the time delay between the 
transmitted and received signals, the amplitude, and even whether there is a re- 
ceived signal or not. It can be shown, by methods that are beyond the scope of 
this text, that the probability of detecting a weak radar signal in the presence of 
noise or other interference is greatest when the signal-to-noise ratio is greatest. 
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Thus, the criterion of maximizing the signal-to-noise ratio is an appropriate one 
with respect to the task the radar system is to perform. 

A similar situation arises in digital communication systems. In such a system 
the message to be transmitted is converted to a sequence of binary symbols, say 
O and 1. Each of the two binary symbols is then represented by time function 
having a specified form. For example, a negative rectangular pulse might repre- 
sent a 0O and a positive rectangular pulse represent a 1. At the receiver, it is 
important that a correct decision be made when each pulse is received as to 
whether it is positive or negative, and this decision may not be easy to make if 
there is a large amount of noise present. Again, the probability of making the 
correct decision is maximized by maximizing the signal-to-noise ratio. 

On the other hand, there are many signals of interest in which the form of the 
signal is not known before it is observed, and the signals can only be observed in 
the presence of noise. For example, in an analog communication system the mes- 
sages, such as speech or music, are not converted to binary signals but are trans- 
mitted in their original form after an appropriate modulation process. At the re- 
ceiver, it is desired to recover these messages in a form that is as close to the 
original message as possible. In this case, minimizing the mean-square error be- 
tween the received message and the transmitted message is the appropriate crite- 
rion. Another situation in which this is the appropriate criterion is the measure- 
ment of biological signals such as is done for electrotroencephalograms and 
electrocardiograms. Here it is important that an accurate representation of the sig- 
nal be obtained and that the effects of noise be minimized as much as possible. 

The above discussion may be summarized more succinctly by the following 
statements that are true in general: 


a) To determine the presence or absence of signal of known form, use the 
maximum output signal-to-noise ratio criterion. 

b) To determine the form of a signal that is known to be present, use the 
minimum mean-square error criterion. 


There are, of course, situations that are not encompassed by either of the above 
general rules, but treatment of such situations is beyond the scope of this book. 





Exercise 9—2.1 


For each of the following situations, state whether the criterion of op- 
timality should be (1) maximum signal-to-noise ratio or (2) minimum 
mean-square error. 


a) Picking up noise signals from distant radio stars 
b) Listening to music from a conventional record 
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c) Listening to music from a digital recording 

d) Communication links between computers 

e) Using a cordless telephone 

f) Detecting flaws in large castings with an ultrasonic flaw detec- 
tor 


Answers: 1,1,1,1,2,2 





9—3 Restrictions on the Optimum System 


It is usually necessary to impose some sort of restriction on the type of system 
that will be permitted. The most common restriction is that the system must be 
causal,' since this is a fundamental requirement of physical realizability. It fre- 
quently is true that a noncausal system, which can respond to future values of the 
input, could do a better job of satisfying the chosen criterion than any physically 
realizable system. A noncausal system usually cannot be built, however, and does 
not provide a fair comparison with real systems, so is usually inappropriate. A 
possible exception to this rule arises when the data available to the system is in 
recorded form so that future values can, in fact, be utilized. 

Another common assumption is that the system is /inear. The major reason for 
this assumption is that it is usually not possible to carry out the analytical solution 
for the optimum nonlinear system. In many cases, particularly those involving 
Gaussian noise, it is possible to show that there is no nonlinear system that will 
do a better job than the optimum linear system. However, in the more general 
case, the linear system may not be the best. Nevertheless, the difficulty of deter- 
mining the optimum nonlinear system is such that it is usually not feasible to hunt 
for it. 

With present day technology it is becoming more and more common to approx- 
imate an analog system with a digital system. Such an implementation may elim- 
inate the need for large capacitances and inductances and, thus, reduce the phys- 
ical size of the optimum system. Furthermore, it may be possible to implement, 
in an economical fashion, systems that would be too complex and too costly to 
build as analog systems. It is not the intent of the discussion here to consider the 
implementation of such digital systems since this is a subject that is too vast to be 
dealt with in a single chapter. The reader should be aware, however, that while 
digital approximations to very complex system functions are indeed possible, there 


'By causal we mean that the system impulse response satisfies the condition A(t) = 0, £ < O [see 
equation (8—3)]. In addition, the stability condition of equation (8—4) is also assumed to apply. 
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are always errors that arise due to both the discrete-time nature of the operation 
and the necessary quantization of amplitudes. Thus, the discussion of errors in 
analog systems in the following sections is not adequate for all of the sources of 
error that arise in a digital system. 

Once a reasonable criterion has been selected, and the system restricted to being 
causal and linear, then it is usually possible to find the impulse response or trans- 
fer function that optimizes the criterion. However, in some cases it may be desir- 
able to further restrict the system to a particular form. The reason for such a 
restriction is usually that it guarantees a system having a given complexity (and, 
hence, cost) while the more general optimization may yield a system that is costly 
or difficult to approximate. An example of this specialized type of otpimization 
will be considered in the next section. 


9—4 Optimization by Parameter Adjustment 


As suggested by the title, this method of optimization is carried out by specifying 
the form of the system to be used and then by finding the values of the compo- 
nents of that system that optimize the selected criterion. This procedure has the 
obvious advantage of yielding a system whose complexity can be predetermined 
and, hence, has its greatest application in cases in which the complexity of the 
system is of critical importance because of size, weight, or cost considerations. 
The disadvantage is that the performance of this type of optimum system will 
never be quite as good as that of a more general system whose form is not speci- 
fied in advance. Any attempt to improve performance by picking a slightly more 
complicated system to start out with leads to analytical problems in determining 
the optimum values of more than one parameter (because the simultaneous equa- 
tions that must be solved are seldom linear), although computer solutions are quite 
possible. As a practical matter, analytical solutions are usually limited to a single 
parameter. Two different examples are discussed in order to illustrate the basic 
ideas. 

As a first example, assume that the signal consists of a rectangular pulse as 
shown in Figure 9—1(a) and that this signal is combined with white noise having 
a spectral density of N,. Since the form of the signal is known, the objective of 
the system is to detect the presence of the signal. As noted earlier, a reasonable 
criterion for this purpose is to find that system maximizing the output signal-to- 
noise power ratio at some instant of time. That is, if the output signal is s,(r), and 
the mean-square value of the output noise is M^, then it is desired to find the 
system that will maximize the ratio s," (t, M*, where £, is the time chosen for this 
to be a maximum. 

In the method of parameter adjustment, the form of the system is specified and 
in this case is assumed to be a simple RC circuit, as shown in Figure 9—1(b). The 
parameter to be adjusted is the time constant of the filter—or, rather, the recipro- 
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s(t) R 
A + + 
s(t) + N(t) d S,(t) + M(t) 
- t _ | . 
1 
0 T b= gc 
(a) (b) 


Figure 9-1 Signal and system for maximizing signal-to-noise ratio: (a) signal to be 
detected and (b) specified form of optimum system. 


cal of this time constant. One of the first steps is to select a time f, at which the 
signal-to-noise ratio is a maximum. An appropriate choice for t, becomes apparent 
when the output signal component is considered. This output signal is given by 


s(t) = A[1 — ey Q=i1< T (9-1) 
=Afl-e "je "7D Tst<o 
and is sketched in Figure 9—2. This result is arrived at by any of the conventional 
methods of system analysis. It is clear from this sketch that the output signal 
component has its largest value at time T. Hence, it is reasonable to choose t, = 
T, and thus 

Sto) = AC — e^) (9-2) 
The mean-square value of the output noise from this type of circuit has been 

considered several times before and has been shown to be 


— bN, 
aay d 





0 l T 


Figure 9-2 Signal component at the RC filter output. 
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Hence, the signal-to-noise ratio to be maximized is 


2 2 —bT\2 
Pig FR Al — 
M? bN,/2 


Before carrying out the maximization, it is worth noting that this ratio is zero for 
both b = 0 and b = ©, and that it is positive for all other positive values of b. 
Hence, there must be some positive value of b for which the ratio is a maximum. 


In order to find the value of b that maximizes the ratio, (9—4) is differentiated 
with respect to b and the derivative equated to zero. Thus, 


dis; GMI _ 2A? [2b — e "re" = (1 — e "YJ 


db N, b -4 ee 
This can be simplified to yield the nontrivial equation 
2bT + 1 = e" (9-6) 


This equation is easily solved for bT by trial-and-error methods and leads to 
bT = 1.256 (9-7) 
from which the optimum time constant is 


T 
"7 1:256 — 
This, then, is the value of time constant that should be used in the RC filter in 
order to maximize the signal-to-noise ratio at time T. 
The next step in the procedure is to determine how good the filter actually is. 
This is easily done by substituting the optimum value of 5T, as given by (9—7), 
into the signal-to-noise ratio of (9—4). When this is done, it is found that 


EJ — 0.8145A?T 


M N (9-9) 


It may be noted that the energy of the pulse is AT, so that the maximum signal- 
to-noise ratio is proportional to the ratio of the signal energy to the noise spectral 
density. This is typical of all cases of maximizing signal-to-noise ratio in the 
presence of white noise. In a later section, it is shown that if the form of the 
optimum system were not specified as it was here, but allowed to be general, the 
constant of proportionality would be 1.0 instead of 0.8145. The reduction in sig- 
nal-to-noise ratio encountered in this example may be considered as the price that 
must be paid for insisting on a simple filter. The loss is not serious here, but in 
other cases it may be appreciable. 

The final step, which is frequently omitted, is to determine just how sensitive 
the signal-to-noise ratio is to the choice of the parameter 5. This is most easily 
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done by sketching the proportionality constant in (9-4) as a function of b. Since 

this constant is simply 

20 = ey 
bT 


K (9-10) 
the result is as shown in Figure 9—3. It is clear from this sketch that the output 
signal-to-noise ratio does not change rapidly with b in the vicinity of the maxi- 
mum so that it is not very important to have precisely the right value of time 
constant in the optimum filter. 

The fact that this particular system is not very sensitive to the value of the 
parameter should not be construed to imply that this is always the case. If, for 
example, the signal were a sinusoidal pulse and the system a resonant circuit, the 
performance is critically dependent upon the resonant frequency and, hence, upon 
the values of inductance and capacitance. 

The second example of optimization by parameter adjustment will consider a 
random signal and employ the minimum mean-square error criterion. In this ex- 
ample the system will be an ideal lowpass filter, rather than a specified circuit 
configuration, and the parameter to be adjusted will be the bandwidth of that filter. 

Assume that the signal X(t) is a sample function from a random process having 
a spectral density of 


5 


A? 


3x (w) = o + (2nf.Y 


(9-11) 
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bT 
Figure 9-3 Output signal-to-noise ratio as function of the parameter bT. 
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Added to this signal is white noise N(r) having a spectral density of N,. These are 
illustrated in Figure 9—4 along with the power transfer characteristic of the ideal 
lowpass filter. 

since the filter is an ideal lowpass filter, the error in the output signal compo- 
net, E(t) — X(t) — Y(t), will be due entirely to that portion of the signal spectral 
density falling outside the filter pass band. Its mean-square value can be obtained 
by integrating the signal spectral density over the region outside of +27B. Be- 
cause of symmetry, only one side need be evaluated and then doubled. Hence 


5 p A’ | 
Uc i dO 
2m J2sB w^ + (27f,) (9-12) 


d Lia a = tan | J 
Am) f, \2 È 


The noise out of the filter, M(r); has a mean-square value of 
— 1 2-mB 
M: 


E 


= Sz Jang No de = 2BN, (9—13) 
The total mean-square error is the sum of these two (since signal and noise are 
statistically independent) and is the quantity that is to be minimized by selecting 
B. Thus, 





= 2A* B 
E +M = (: — tan ! 2) + 2BN, (9-14) 


The minimization is accomplished by differentiating (9—14) with respect to B and 
setting the result equal to zero. Thus, 


| - Lf, | 





Ac f, | 1 + (BDY 


X(t) + N(t) 





Figure 9-4 Signal and noise spectral densities, filter characteristic. 
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from which it follows that 





A? 1/2 
= oe 
B E fa ) (9-15) 


is the optimum value. The actual value of the minimum mean-square error can be 
obtained by substituting this value into (9—14). 

The form of (9—15) is not easy to interpret. A somewhat simpler form can be 
obtained by noting that the mean-square value of the signal is 


— 4mf, 





! A? 
x 


and that the mean-square value of that portion of the noise contained within the 
equivalent-noise bandwidth of the signal is just 


Ny! = mf,N, 


since the equivalent-noise bandwidth of the signal is (77/2)f,. Hence, (9—15) can 
be written as 
m 1/2 
X xb 73 
B - (5 — 1) X > Ny (9—16) 
Nx 





and sketched as in Figure 9—5. 

It is of interest to note from Figure 9—5 that the optimum bandwidth of the filter 
is zero when the mean-square value of the signal into the filter is equal to the 
mean-square value of the noise within the equivalent-noise bandwidth of the sig- 
nal. Under these circumstances there is no signal and no noise at the output of 
filter. Thus, the minimum mean-square error is just the mean-square value of the 
signal. For smaller values of signal mean-square value the optimum bandwidth 





Figure 9—5 Optimum bandwidth. 
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remains at zero and the minimum mean-square error is still the mean-square value 
of the signal. 

The example just discussed is not a practical example as it stands because it 
uses an ideal lowpass filter, which cannot be realized with a finite number of 
elements. This filter was chosen for reasons of analytical convenience rather than 
practicality. However, a practical filter with a transfer function that drops off 
much more rapidly than the rate at which the signal spectral density drops off 
would produce essentially the same result as the ideal lowpass filter. Thus, this 
simple analysis can be used to find the optimum bandwidth of a practical lowpass 
filter with a sharp cutoff. One should not conclude, however, that this simplified 
approach would work for any practical filter. For example, if a simple RC circuit, 
such as that shown in Figure 9—1(b), were used, the optimum filter bandwidth is 
quite different from that given by (9—15). This is illustrated in Exercise 9—4.2 
below. 

In each of the examples just discussed only one parameter of the system was 
adjusted in order to optimize the desired criterion. The procedure for adjusting 
two or more parameters is quite similar. That is, the quantity to be maximized or 
minimized is differentiated with respect to each of the parameters to be adjusted 
and the derivatives set equal to zero. This yields a set of simultaneous equations 
that can, in principle, be solved to yield the desired parameter values. As a prac- 
tical matter, however, this procedure is only rarely possible because the equations 
are usually nonlinear and analytical solutions are not known. Computer solutions 
may be obtained frequently, but there may be unresolved questions concerning 
uniqueness. 





Exercise 9—4.1 


A rectangular pulse defined by 
s(t) = 2[u(t) — u(t — 1)) 


is combined with white noise having a spectral density of 2 V*/Hz. It 
is desired to maximize the signal-to-noise ratio at output of a finite- 
time integrator whose impulse response is 


h(t) — lu) SIMI 


a) Find the value of 7 that maximizes the output signal-to-noise 
ratio. 


b) Find the value of the maximum output signal-to-noise ratio. 
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c) If the integration time, 7, is changed by 10% on either side of 
the optimum value, find the percent drop in output signal-to- 
noise ratio. 


Answers: 1, 2, 9.09, 10 


Exercise 9—4.2 


A signal having a spectral density of 
ard E 
w + 2.25 


and white noise having a spectral density of 1 V^/Hz are applied to a 
lowpass RC filter having a transfer function of 


Sy (c) = 





a) Find the value of b that minimizes the mean-square error be- 
tween the input signal and the total filter output. 


b) Find the value of the minimum mean-square error. 
c) Find the value of mean-square error that is approached as b 
approaches 0. 


Answers: 4.82, 5.57, 13.33 





9—5 Systems That Maximize Signal-to-Noise 
Ratio 


This section will consider systems that maximize signal-to-noise ratio at a speci- 
fied time, when the form of the signal is known. The form of the system is not 
specified; the only restrictions on the system being that it must be causal and 
linear. 

The notation is illustrated in Figure 9—6. The signal s(t) is deterministic and 
assumed to be known (except, possibly, for amplitude and time of occurrence). 
The noise N(r) is assumed to be white with a spectral density of N,. Although the 
case of nonwhite noise is not considered here (except for a brief mention at the 
end of this section), the same general procedure can be used for it also. The output 
signal-to-noise ratio is defined to be s, (t, M"^, and the time t, is to be selected. 
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s(t) +N(t) So(t) + M(t) 





Figure 9-6 Notation for optimum filter. 


The objective is to find the form of h(t) that maximizes this output signal-to-noise 
ratio. 
In the first place, the output signal is given by 


st) = Í h(A)s(t — A) da (9-17) 


and the mean-square value of the output noise is, for a white noise input, given 
by 


M =N, I h(A) da (9-18) 


Hence, the signal-to-noise ratio at time f, is 


x 2 
| h(A)s(t, — A) a 


j on 
M N, Í h^(A) da 


$5 (£) e 





(9-19) 


In order to maximize this ratio it is convenient to use the Schwarz inequality. 
This inequality states that for any two functions, say f(t) and g(t), that 


b 2 b bo 
| | f(g «| = | f(t) « | g°(t) dt (9-20) 


Furthermore, the equality holds if and only if f;(t) = Kkg(t), where k is indepen- 
dent of t. 

Using the Schwarz inequality on (9—19) leads to 
RA a | *(t, — A) dA 
s, (t) [rwaf se-a 


Ü 
= 





(9-21) 


—_ 


2 AN 
M N, h^(A) da 


From this it is clear that the maximum value of the signal-to-noise ratio occurs 
when the equality holds, and that this maximum value is just 


So (to) | [ - 
E u = N, | s(t = A) aa (9-22) 
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since the integrals of A^(A) cancel out. Furthermore, the condition that is required 
for the equality to hold 1s 


h(A) = ks(t, — A)u(A) (9-23) 


Since the k is simply a gain constant that does not affect the signal-to-noise ratio, 
it can be set equal to any value; a convenient value is k = 1. The u(A) has been 
added to guarantee that the system is causal. Note that the desired impulse re- 
sponse is simply the signal waveform run backwards in time and delayed by t, 
seconds. 

The right side of (9—22) can be written in slightly different form by letting t = 
t, — A. Upon making this change of variable, the integral becomes 


i s(t, — A) dA = [ s*(t) dt = e(t,) (9-24) 


and it is clear that this is simply the energy in the signal up to the time the signal- 
to-noise ratio is to be maximized. This signal energy is designated as e(t,). 
To summarize, then: 


1. The output signal-to-noise ratio at time £, is maximized by a filter whose 
impulse response is 


A(t) = s(t, — Du(t) (9-25) 


2. The value of the maximum signal-to-noise ratio is 


So (to) Elto) 
l M L i No in 


where £(t,) is the energy in s(t) up to the time f,. 


The filter defined by (9-25) is usually referred to as a matched filter. 

As a first example of this procedure, consider again the case of the rectangular 
signal pulse, as shown in Figure 9—7(a), and find the A(t) that will maximize the 
signal-to-noise ratio at t, = T. The reversed and translated signal is shown (for 
an arbitrary f,) in Figure 9—7(b). The resulting impulse response for t, = T is 
shown in Figure 9—7(c) and is represented mathematically by 

h(t) =A QO=r=T (9-27) 
= 0 elsewhere 


The maximum signal-to-noise ratio is given by 


2 2 
So 2 e(t) AT 
-= = — = — (9-28) 
| M max N O N co 


This result may be compared with (9—9). 
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s(t) s(to—t) h(t)  s(T— t) 





0 T to-T 0 to 
(a) (b) (c) 


Figure 9-7  Matched filter for a rectangular pulse: (a) signal, (b) reversed, trans- 
lated signal, and (c) optimum filter for t, = T. 


In order to see the effect of changing the value of f,, the sketches of Figure 9— 
8 are presented. The sketches show s(t, — t), h(t), and the output signal s,(t), all 
for the same input s(t) shown in Figure 9—7(a). It is clear from these sketches that 
making t, < T decreases the maximum signal-to-noise ratio because not all of the 
energy of the pulse is available at time ¢,. On the other hand, making t, > T does 
not further increase the output signal-to-noise ratio, since all of the pulse energy 
is available by time 7. It is also clear that the signal out of the matched filter does 
not have the same shape as the input signal. Thus, the matched filter is not suit- 
able if the objective of the filter is to recover a nearly undistorted rectangular 
pulse. 

As a second example of matched filters, it is of interest to consider a signal 
having finite energy but infinite time duration. Such a signal might be 


s(t) = Ae~™ u(t) (9-29) 
as shown in Figure 9—9. For some arbitrarily selected 1,, the optimum matched 
filter is 

h(t) = Ae "o7? u(t) (9-30) 


and is also shown. The maximum signal-to-noise ratio depends upon t,, since the 
available energy increases with ¢,. In this case it is 





2 2 
So ol E(t) A -2h 

X r iii [| — e ^"^] (9-31) 
| M? max 

It is clear that this approaches a limiting value of A’/2bN, as t, is made large. 
Hence, the choice of t, is governed by how close to this limit one wants to 
come—remembering that larger values of t, generally represent a more costly 


system. 
The third and final illustration of the matched filter considers signals having 
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h(t) 
A 
t 
B. de O tT ET 
s(tg— t) | h(t) 
A A 
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s(tg — t) h(t) 
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O to-T — to O to-T O to-T to totT 


Figure 9-8 Optimum filters and responses for various values of t,. 


s(t) 





0 





Figure 9-9 Matched filter for an exponential signal. 
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both infinite energy and infinite time duration, that is, power signals. Any peri- 
odically repeated waveform is an example of this type of signal. A case of con- 
siderable interest is that of periodically repeated RF pulses such as would be used 
in a pulse radar system. Figure 9-10 shows such a signal, the corresponding 
reversed, translated signal, and the impulse response of the matched filter. In this 
sketch, t, has been shown as containing an integral number of pulses, but this is 


not necessary. Since the energy per pulse is just zA fp the signal-to-noise ratio 


out of a filter matched to N such pulses is 


r.2 2 
So (to) _ NA‘tr ! 
[m] m ex 


It is clear that this signal-to-noise ratio continues to increase as the number of 
pulses included in the matched filter increases. However, it becomes very difficult 
to build filters that are matched for very large values of N, so usually N is a 
number less than 10. 

Although it is not intended to discuss the case of nonwhite noise in any detail, 
it may be noted that all that is needed in order to apply the above matched filter 
concepts is to precede the matched filter with a network that converts the nonwhite 
noise into white noise. Such a device is called a whitening filter and has a power 
transfer function that is the reciprocal of the noise spectral density. Of course, the 
whitening filter changes the shape of the signal so that the subsequent matched 


Af ^ f ^ fA ^ f 
VU IVU TU. UV: 


tp 
> 
N Pulses 
S(to —t) 
É AWA A awa + 
VAY JV Tay V V 
h(t) 
| == N Pulses 
AT A [ DIE o 
JU Y V V 


Figure 9-10 Matched filter for N pulses. 
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filter has to be matched to this new signal shape rather than to the original signal 
shape. 

An interesting phenomenon know as singular detection may arise sometimes 
for certain combinations of input signal and nonwhite noise spectral density. Sup- 
pose, for example, that the nonwhite noise has a spectral density of 





Sw) = — 
" w + | 

The power transfer function of the whitening filter that converts this spectral den- 
sity to white noise is 


7 


|H(w)|? = w + 1 = (jo + D(—jo + 1) 


SW) 
Thus, the voltage transfer function of the whitening filter 1s 
H(w) = jo + | 


and this corresponds to an impulse response of 


h(t) = &(0 + W 


Hence, for any input signal s(f), the output of the whitening filter is s(t) + s(f). 
If the input signal is a rectangular pulse as shown in Figure 9—7(a), the output of 
the whitening filter will contain two 6 functions because of the differentiation 
action of the filter. Since any 6 function contains infinite energy, it can always be 
detected regardless of how small the input signal might be. The same result would 
occur for any input signal having a discontinuity in amplitude. This is clearly not 
a realistic situation and arises because the input signal is modeled in an idealistic 
way. Actual signals can never have discontinuities and, hence, singular detection 
never actually occurs. Nevertheless, the fact that the analysis suggests this possi- 
bility emphasizes the importance of using realistic mathematical models for sig- 
nals when matched filters for nonwhite noise are considered. 





Exercise 9—5.1 


A signal of the form 
s(t) = l.5t[u(t) — u(t — 2)] 


is to be detected in the presence of white noise having a spectral 
density of 0.15 V?/Hz using a matched filter. 


350 CHAPTER 9 OPTIMUM LINEAR SYSTEMS 


a) Find the smallest value of t; that will yield the maximum output 
signal-to-noise ratio. 


b) For this value of ty find the value of the matched-filter impulse 
response at t = 0, 1, and 2. 


c) Find the maximum output signal-to-noise ratio. 


Answers: 0,1.5,2,3,5 


Exercise 9—5.2 


A signal of the form 
s(t) E 5e tt u(t + 2) 


is to be detected in the presence of white noise having a spectral 
density of 0.25 V^/Hz using a matched filter. 


a) For ty = 2 find the value of the impulse response of the 
matched filter at t — O, 2, 4. 


b) Find the maximum output signal-to-noise ratio that can be 
achieved if tọ = =. 


c) Find the value of ty that should be used to achieve an output 
signal-to-noise ratio that is 0.95 of that achieved in part (b). 


Answers:  —0.502, 0.0916, 0.677, 5, 50 





9—6 Systems That Minimize Mean-Square 
Error 


This section considers systems that minimize the mean-square error between the 
total system output and the input signal component when that signal is from a 
stationary random process. The form of the system is not specified in advance, 
but it is restricted to be linear and causal. 

It is convenient to use s-plane notation in carrying out this analysis, although it 
can be done in the time-domain as well. The notation is illustrated in Figure 9— 
11, in which the random input signal X(r) is assumed to have a spectral density 
of Sy(s), while the input noise N(r) has a spectral density of S4(s). The output 
spectral densities for these components are Sy(s) and S;(s), respectively. There is 
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Sy(s) + Sy(s) _ Sy(S) + Sy s) 


Figure 9-11 Notation for the optimum system. 


no particular simplification from assuming the input noise to be white (as there 
was in the case of the matched filter), so it will not be done here. 

The error in the signal component, produced by the system, is defined as before 
by 


E(t) = X(t) — Y(t) 
and its Laplace transform is 
Fg(s) = Fx(s) — Fy(s) = Fx(s) — H(s)Fx(s) = Fx(s)[1 — H(s) (9-33) 


Hence, | — H(s) is the transfer function relating the signal error to the input 
signal, and the mean-square value of the signal error is given by 
i an 


l 
= — |  Sx(9[1 — H(s)[1 — H(—s)] ds (9—34) 
2nj J-j = 


E 


The noise appearing at the system output is M(t), and its mean-square value is 


x = = . — Sy(s)H(s)H(—s) ds (9-35) 


The total mean-square error is E? + M? (since signal and noise are statistically 
independent) and may be expressed as 


Ea | — {Sx(9 1 — H(s)[1 — H(C— s)] 
Aj J-j =” 
+ Sy(s)H(s)H(—s)}ds (9-36) 


The objective now is to find the form of H(s) that minimizes (9—36). 

If there were no requirement that the system be causal, finding the optimum 
value of H(s) would be very simple. In order to do this, rearrange the terms in 
(9-36) as 


> 1p” 
EFM = + | [Sx (s) + SyGO)H()H(— s) 
àmj J-j 
—Sx(s)H(s) — Sy(s)H(—s) + Sx(s)) ds (9-37) 


Since [Sy (s) + Sy(s)] is also a spectral density, it must have the same symmetry 
properties and, hence, can be factored into one factor having poles and zeros in 
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the left half plane and another factor having the same poles and zeros in the right 
half plane. Thus, it can always be written as 
Sx(s) + Sy(s) = F(S)F(— 5) (9-38) 
Substituting this into (9-37), and again rearranging terms, leads to 
| J e E Sx (s) Sx (s) 
P+M] Fi(s)H(s) - ——— || F(-9H(—s) - —— 
Irj -j « [| SENSO — ea | ^ 4 9 — P6 | 
Sx (s)S,(S) | 
F(s)FK( — s) dt (90 





It may now be noted that the last term of (9—39) does not involve H(s). Hence, 
the minimum value of E^ + M* will occur when the two factors in the first term 
of (9—39) are zero (since the product of these factors cannot be negative.) This 
implies, therefore, that 
H(s) = __Sx(s) — € LAE (9—40) 
F(s)F(—s) | Sx(s) + Sys) 
should be the optimum transfer function. This would be true except that (9—40) is 
also symmetrical in the s-plane and, hence, cannot represent a causal system. 
Since the H(s) defined by (9—40) is not causal, the first inclination is to 
simply use the left half plane poles and zeros of (9—40) to define a causal system. 
This would appear to be analogous to eliminating the negative time portion of 
s(t, — t) in the matched filter of the previous section. Unfortunately, the problem 
is not quite that simple, because in this case the total random process at the system 
input, X(t) + N(t), is not white. If it were white, its autocorrelation function 
would be a 6 function and, hence, all future values of the input would be uncor- 
related with the present and past values. Thus, a system that could not respond to 
future inputs (that is, a causal system) would not be ignoring any information that 
might lead to a better estimate of the signal. It appears, therefore, that the first 
step in obtaining a causal system should be to transform the spectral density of 
signal plus noise into white noise. Hence, a whitening filter is needed. 
From (9—38), it is apparent that if one had a filter with a transfer function of 
l 
H (s) FAS) (9-41) 
then the output of this filter would be white, because 
! : Sx(s) + Sys) 
[Sx (s) + Sy(s)JHi(s)H\(— s) FAS)F( — 5) 


Furthermore, H,(s) would be causal because F(s) by definition has only left half 
plane poles and zeros. Thus, H;(s) is the whitening filter for the input signal plus 
noise. 
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Next, look once more at the factor in (9-39) that was set equal to zero; that is, 


Sx (s) 


F(s))H() — = = 


The source of the right half plane poles is the second term of this factor, but that 
term can be broken up (by means of a partial fraction expansion) into the sum of 
one term having only left half plane poles and one having only right half plane 
poles. Thus, write this factor as 


Sx (s) Sy (: Sx( 
FDH) - Z = FIG) - Es - | 29 (9-42 
z i i^ L i R 


where the sub L implies left half plane poles only and the sub R implies right half 
plane poles only. It is now clear that it is not possible to make this entire factor 
zero with a causal H(s), and that the smallest value that it can have is obtained by 
making the difference between the first two terms of the right side of (9—42) equal 


to zero. That is, let 
F(s)H(s) — [a = 0 
=s) |i 
or 


_ 1 [se 
aO 7 FS EX sis 


Note that the first factor of (9—43) is H,(s), the whitening filter. Thus, the elimi- 
nation of the noncausal parts of the second factor represents the best that can be 
done in minimizing the total mean-square error. 

The optimum filter, which minimizes total mean-square error, is often referred 
to as the Wiener filter. It can be considered as a cascade of two parts, as shown 
in Figure 9—12. The first part is the whitening filter H,(s), while the second part, 
H,(s), does the actual filtering. Often H,(s) and H>(s) have common factors that 
cancel to yield an H(s) that is simpler than might be expected (and easier to build 
than either factor). 

As an example of the Wiener filter, consider a signal having a spectral density 
of 





Sx (s) 


and noise with a spectral density of 





Sys) 
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Sx(s) + Sy(s) Sy(s) + Sy(s) 






T€ os _ | Sx(s) 
Hi(s) ~ Fi(s) H2(s) — E21 
L 


H(s) = H; (s) H2(s) 


F; (s) F;(—s) = S,(s) + Sy(s) 
Figure 9-12 The optimum Wiener filter. 


Thus, 


mE os. —(2s* — 5) 
sv —-1 s?-4 (s* — 1s? -4 








F(s)F(—s) = Sx(s) + Ss) = 


from which it follows that 


— V2 (s + V2.5) 
A) G+ 6*2 wr 
Therefore, the whitening filter is 


1 (s DX + 2) 


H Sp ea pare ee 
9) = FG) VIG + V25) i 
The second filter section is obtained readily from 
Sx (s) —]1 (—s + 1)(-—s + 2) $—2 


which may be broken up by means of a partial fraction expansion into 


Sx(s) 0.822 0.115 


FÁ(-— s) $41 $— V2.5 





Hence, 
Sx (s) 0.822 
A = lae sti jisi 
The final optimum filter is 
H(s) = OR = (s + 1)(s + 2) || 0.822 _ 0.582(s + 2) at 
Uu) eia) = v2 (s + NV2.5||s + 1 s v2.5 e) 


Note that the final optimum filter is simpler than the whitening filter and can be 
built as an RC circuit. 


MEAN-SQUARE ERROR 355 


The remaining problem is that of evaluating the performance of the optimum 
filter; that is, to determine the actual value of the minimum mean-square error. 
This problem is greatly simplified by recognizing that in an optimum system of 
this sort, the error that remains must be uncorrelated with the actual output of the 
system. If this were not true it would be possible to perform some further linear 
operation on the output and obtain a still smaller error. Thus, the minimum mean- 
square error is simply the difference between the mean-square value of the input 
signal component and the mean-square value of the total filter output. That 1s, 


(E? + MPs (9—48) 
l 


M eis Sx(s)ds — s | [Sx (s) + SyG)])HG)H(C- s) ds 
2mj J-j = 2mj J-j œ 


| oc 


when H(s) is as given by (9—43). 

The above result can be used to evaluate the minimum mean-square error that 
is achieved by the Wiener filter described by (9—47). The first integral in (9—48) 
is evaluated easily by using either Table 7-1 or by summing the residues. Thus, 





LI, | f^ -1 
—À Sx(s) ds = — ds = 0.5 
2nj J-j = x (s) ds om E —1^ 


The second integral is similarly evaluated as 
AE [Sx(s) + SyG)]HG)H( — s) ds 
Tj J-j = 
| 1 f^ -Qs£-5  (.58)(s - 4) 
= 2njJ-je(s — D$ —4)  (s$-2.5) 


l in —2(0.5827 , |, 
qw (p. d 70399 


27] 
The minimum mean-square error now becomes 
(E? + M), = 0.5 — 0.339 = 0.161 


It is of interest to compare this value with the mean-square error that would result 
if no filter were used. With no filtering there would be no signal error and the 
total mean-square error would be the mean-square value of the noise. Thus 


Sp.. Sa i U^ 
E+] 
2mj J-j es — 





1 ds = 0.25 


and it is seen that the use of the filter has substantially reduced the total error. 
This reduction would have been even more pronounced had the input noise had a 
wider bandwidth. 
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Exercise 9—6.1 


A random signal has a spectral density of 
=] 
s$$-—1 





Sy(s) = 


and is combined with noise having a spectral density of 
2 


$=] 





Ss) = 


Find the minimum mean-square error between the input signal and 
the total filter output that can be achieved with any linear, causal filter. 


Answer: 0.375 


Exercise 9—6.2 
A random signal has a spectral density of 


— 25? 
xt) = $= 138 + 36 
and is combined with white noise having a spectral density of 1.0. Find 
the poles and zeros of the optimum causal Wiener filter that minimizes 
the mean-square error between the input signal and the total filter out- 
put. 


Answers: 0, — V3, -2v3 





PROBLEMS 


9-2.] For each of the situations listed below, indicate whether the appropriate 
criterion of optimality is maximum signal-to-noise ratio or minimum 
mean-square error. 


a) An automatic control system subject to random disturbances 


9-2.2 


9-3.1 


9-4.1 
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b) An aircraft flight-control system 

c) A pulse radar system 

d) A police speed radar system 

e) A particle detector for measuring nuclear radiation 


f) A passive sonar system for detecting underwater sounds 


A signal consisting of a steady-state sinusoid having a peak value of 2 V 
and a frequency of 80 Hz is combined with white noise having a spectral 
density of 0.01 V7/Hz. A single-section RC filter having a transfer func- 
tion of 





He b + jo 


is used to extract the signal from the noise. 

a) Determine the output signal-to-noise ratio if the half-power band- 
width of the filter is 10 Hz. 

b) Repeat if the filter half-power bandwidth is 100 Hz. 

c) Repeat if the filter half-power bandwidth is 1000 Hz. 


The impulse response, A(t), of the system shown below is causal and the 
input noise N(f) is zero-mean, Gaussian and white. 


a) Prove that the output M(t) is independent of N(t + 7) for all 7 > 0 
(that is, future values of the input) but is not independent of 
N(t + 7) for 7 = O (that is, past and present values of the input). 


b) Prove that the statement in (a) is not true if the system is noncausal. 


a) For the signal and noise of Problem 9—2.2, find the filter half-power 
bandwidth that maximizes the output signal-to-noise ratio. 


b) Find the value of the maximum output signal-to-noise ratio. 
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9—4.2 The signal s(t) below is combined with white noise having a spectral 
density of 2 V7/Hz. It is desired to maximize the signal-to-noise at the 
output of the RC filter, also shown below, at t = 0.01 seconds. Find the 
value of RC in the filter that achieves this result. 


0 0.01 


9—4.3 A random signal having a spectral density of 
S.(@) = 2 lo] = 10 
= 0 elsewhere 


is observed in the presence of white noise having a spectral density of 
2V?/Hz. Both are applied to the input of a lowpass RC filter having a 
transfer function of 





H m 
(v) jo ^ b 


a) Find the value of b that minimizes the mean-square error between the 
input signal and the total filter output. 


b) Find the value of the minimum mean-square error. 


9—4.4 A random signal having a spectral density of 


_ lel 
10 


= () elsewhere 


Sx (@) l le| = 10 


is observed in the presence of white noise having a spectral density of 
0.1V?/Hz. Both are applied to the input of an ideal lowpass filter whose 
transfer function is 


Hw) -1  |e|  2zW 


0 elsewhere 


9-5.1 


9-5.2 


9-5.3 
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a) Find the value of W that maximizes the ratio of signal power to noise 
power at the output of the filter. 


b) Find the value of W that minimizes the mean-square error between 
the input signal and the total filter output. 


s(t) 


-2 0 2 


a) The signal shown above is combined with white noise having a spec- 
tral density of 0.1 V?/Hz. Find the impulse response of the causal 
filter that will maximize the output signal-to-noise ratio at tj = 2. 


b) Find the value of the maximum output signal-to-noise ratio. 


c) Repeat (a) and (b) for t; = 0. 


A signal has the form 
s(t) = te ' u(t) 


and is combined with white noise having a spectral density of 0.005 
V?/Hz. 


a) What is the largest output signal-to-noise ratio that can be achieved 
with any linear filter? 


b) For what observation time tọ should a matched filter be constructed 
to achieve an output signal-to-noise ratio that is 0.9 of that deter- 
mined in (a)? 


A power signal consists of rectangular pulses having an amplitude of 1 V 
and a duration of 1 millisecond repeated periodically at a rate of 100 
pulses per second. This signal is observed in the presence of white noise 
having a spectral density of 0.001 V?/Hz. 


a) If a causal filter is to be matched to N successive pulses, find the 
output signal-to-noise ratio that can be achieved as a function of N. 


360 


9-5.4 


9-5.5 
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b) How many pulses must the filter be matched to in order to achieve 
an output signal-to-noise ratio of 100? 


c) Sketch a block diagram showing how such a matched filter might be 
constructed using a finite-time integrator and a transversal filter. 


Below is a block diagram of another type of filter that might be used to 
extract the pulses of Problem 9—5.3. This is a recursive filter that does 
not distort the shape of the pulse as a matched filter does. 










Delay Ones 
0.01 second 


a) Find the largest value the gain parameter A can have in order for the 
filter to be stable. 


b) Find a relationship between the output signal-to-noise ratio and the 
gain parameter A. 


c) Find the value of A that is required to achieve an output signal-to- 
noise ratio of 100. 


The diagram below represents a particle detector connected to an amplifier 
with a matched filter in its output. The particle detector may be modeled 
as having a source impedance of 1 M2 and producing an open circuit 
voltage for each particle of 


s(t) = 10~4e~!y(t) 






Particle > ' Amplifier Output 
detector and filter 


9—6.1 


9-6.2 


9-6.3 
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The input circuit of the amplifier may be modeled as a 1 MA resistor in 
parallel with a current source for which the current is from a white noise 
source having a spectral density of 10 *° A*/Hz. The amplifier may be 
assumed to have an output impedance that is negligibly small compared 
to the input impedance of the filter. 


a) Find the impulse response of the filter that will maximize the output 
signal-to-noise ratio at the time at which the output of the particle 
detector drops to one-hundredth of its maximum value. 


b) Find the value of the maximum output signal-to-noise ratio. 


A signal is from a stationary random process having a spectral density of 


16 


mee 8 6 


and is combined with white noise having a spectral density of 0.1 V?/Hz. 


a) Find the transfer function of the noncausal linear filter that will min- 
imize the mean-square error between the input signal and the total 
filter output. 


b) Find the value of the minimum mean-square error that is achieved 
with this filter. 


Repeat Problem 9—6.1 using a causal linear filter. 


A signal is from a stationary random process having a spectral density of 





4 
Sy(w) = — 
x(w) w +4 
and is combined with noise having a spectral density of 


fa) 
S@) = oo ra 4 





a) Find the transfer function of the causal linear filter that minimizes the 
mean-square error between the input signal and the total filter output. 


b) Find the value of the minimum mean-square error. 





362 CHAPTER 8 RESPONSE OF LINEAR SYSTEMS 


9—6.4 


Vibration ce 
sensor 


10 kD 














Amplifier | Filter 





The block diagram above illustrates a system for measuring vibrations 
with a sensitive vibration sensor having an internal impedance of 10,000 
N. The open-circuit signal produced by this sensor comes from a station- 
ary random process having a spectral density of 


10 ^o 


———————— X? Hz 
wt + 13w" + 36 


Sx(@) = 
The output of the vibration sensor is connected to the input of a broad- 
band amplifier whose input circuit may be modeled by a resistance of 
10,000 22 in parallel with a noise current source in which the current is a 
sample function from a white noise source having a spectral density of 
10 5 A*/Hz. The amplifier has a voltage gain of 10 and has an output 
impedance that is negligibly small compared to the input impedance of 
the causal linear filter connected to the output. 


a) Find the transfer function of the output filter that will minimize the 
mean-square error between the signal into the filter and the total out- 
put from the filter. Normalize the filter gain so that its maximum gain 
is unity. 


b) Find the ratio of the minimum mean-square error to the mean-square 
value of the signal into the filter. 


References 


See the References for Chapter 1. Of particular interest for the material of this chapter are the 
books by Davenport and Root and by Lanning and Battin. 
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Mathematical 
T ables 





Table A—1 Trigonometric Identities 


sin (A + B) = sin A cos B + cos A sin B 


cos (A + B) = cos A cos B + sin A sin B 
l 
cos A cos B = > [Cos (A + B) + cos (A — B)] 


l 
sin A sin B = z (cos (A — B) — cos (A + B)] 


sin A cos B = — [sin (A + B) + sin (A — B)] 


l l 
sinA + sin B = 2 sin 5 (A + B) cos (A — B) 


l l 
sinA — sin B 25n 7 (A — B) eon, (AF BJ 


l l 
cos A + cos B 2 cos 5 (A + B) cos? (A — B) 


l l 
cos A — cos B -2sin 7 (A + B) sin? (A — B) 


sin 2A — 2 sin A cos A 


cos 2A = 2co?A—-121-— 2 sin? A = cos? A — sin? A 


l l 
sinc Á = —(1 — cos A) 
2 2 


l 
cos- Á = z C + cos A) 
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Table A—1 (continued) 


l 
sin? A = 5 Cl — cos 2A) 


l 
cos? A = 2 | + cos 2A) 
à; e*—e* e” +e 
"EX = ————— and COS X s ————— 
2j 2 
e" = cos x + jsinx 
A cos (wt + $) + B cos (ot + do) = C cos (wt + d) 


where C = VA? + B? + 2AB cos (6 — 4) 


"-— A sin $, + B sin d> 
: A cos $, + B cos d» 


sin (wt + $) = cos (wt + $ — 90?) 


Table A—2 Indefinite Integrals 





l l 
| sin ax dr = — + cos ax | cos ax ax = + sin ax 
a a 
m^ X sin 2ax 
sin“ ax dx = — — 
| 2 4a 
. a 
| x sin ax = OR GX = ax Cosar) 


l 
| 2 sin ax dr = 4 Qax sin ax + 2 cos ax — ax’ cos ax) 
a 


l i 
x cos ax dx = — (cos ax + ax sin ax) 
a 


| 3 cos ax ax = — (2ax cos ax — 2 sin ax + a^x? sin ax) 
a 


sin (a — b)x B sin (a + b) 


: à u 2 2 
| sin ax sin bx de Xa — b) Xa + b) a + b 
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Table A—2 (continued) 


| sin ax cos bx dx = = [sec xor 
L Xa — b) a tb Jl gp 
| cos ax cos bx dx = sin (a — bx , Sin (a + bx 
| 2(a — b) 2(a + b) 
| em ae = le 
a 
E = -z (ax — 1) 


| Pe as = (ab? — 200 +: 2) 
a 


ay 


ax ni e : f 
E sin bx dx — Z.p sin bx — b cos bx) 


e" 
di bx dx = ———3 in b 
| e cos bx rom (a cos bx + b sin bx) 


Table A—3 Definite Integrals 


D o xus n! I(n + 1) 
| xe dx = Fal = qu 


iv 





where f(u) = Í z'" leg (Gamma function) 


[ae Y 
e dx = —— 
0 2r 
[ 72-4 
0 2r 
æ Var 
x^ pega dx = i 
| sf : 4r) 
Aaa [rin + 1/2] 
f Fe ee 


? sin ax 
: x 


2 
2 

c . 2 
sIn X 
om 
2 


. for a>0,a=0,a<0 
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Table A—3 (continued) 





o .2 
Sin” ax 
0 a 


TT Fra a TT 
Í sin? mx dx = | sin? x dx = | cos? mx dx = | cos’ x dx = 


= la| Z 
“9 


" 
g^ 


m an integer 


rra A" 
| sin mx sin nx dx = | cos mx cos nx dx — 0 m * n 


m, n integers 


T 
| sin mx cos nx dx = 3 m^ — n? if m + n odd 
0 if m + n even 


Table A—4 Fourier Transforms 


Description 
Definition 


Reversal 


Symmetry 
Scaling 


Delay 
Complex 
conjugate 
Time 
differentiation 
Frequency 
differentiation 
Time 
integration 
Time 
convolution 
Frequency 
convolution 


Parseval's 
theorem 


fo 
fH = Es f | F(a)e dw 
2m J-a 


fC- 0 
F(t) 


fat) 
f(t — to) 
f*) 


d"f(t) 
dt" 


Uf) 
[so dt 
Í JAV — A) dA 


Ris) 


F(w) 
F(w) = | | f(0e ^" dt 


F( — w) 
2mf( — w) 


e “OF (a) 
F*(—w) 
Cw Fiw) 


„n d FG) 
P 


| Fw) + mF (0)8(w) 
jo 
F\(@)F (o) 


l 2n 
l| rro - 8 dé 
Am J-- 


- | 
| , AI) dt = pm [. F(c)Fx( — w) do 
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Table A—4 


Description 


Modulation 


Unit impulse 
Unit step 


Signum 
function 


Sine 


Cosine 


Rectangular 
pulse 


Triangular pulse 
Gaussian pulse 
Fourier series 


Impulse train 


Constant 


(continued) 


fÀ 


eof ( t) 
ot) 


u(t) 


sgn t 


sin «of 
COS wot 


2x5 0 


—I/2 0 


—- a?) 
e 


Ps a, e m nt/T) 


n= -w 


S a(t — nT) 


pa =œ 


K 


T/2 


T/2 


F(@) 
F(@ — Wp) 
l 
TÓ(o) + — 
2 
jw 


—jm[6(w — wo) — lw + wo)] 
T[O(w — wo) + lw + wo)) 


sin (w7/2) 
wT/2 


T pt (w7/4)\? 
2 wT/4 


MT Qaa 
a 
2T b» a,d (^ -— =) 


2T KÓ(co) 


Table A-5 
Description 


Definition 
Derivative 

2nd derivative 
Integral 

t multiplication 


Division by f 


Delay 


Exponential decay 


Scale change 


Convolution 


Initial value 


Final value 


Impulse 


Step 


Ramp 


nth order ramp 


Exponential 


Damped ramp 


Sine wave 


Cosine wave 


Damped sine 


Damped cosine 


One-Sided Laplace Transforms 


f(t) 
fo = — l F(s) e" ds 
2j Je-j» 

fà = D 

wen az siii 
f'(- Uu 
[ro d£ 

tf (r) 

| 

24 
f(t — tult — to) 
e f(t) 
flat) a0 


| rana- a) aa 


f(0") 


f(*) 
Slr) 


u(t) 

tuit) 

fut) 

e^ y(t) 

te Mut) 

sin (Br)u(t) 

cos (Br)u(r) 

e " sin (Br)u(t) 


e ™ cos (Br)u(r) 


F(s) 


F(s) = Í f(t)e ^" dt 
sF(s) — f(0) 
sF(s) — sf(0) — f'(0) 
: F(s) 
a 
_ dF(s) 
ds 
| re ae 


e “OF(s) 
Fis + a) 


di 

a \a 

F\(s)F2(s) 

im SF 

lim sF(s) [F(s)-Left-half 


plane poles only] 
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Frequently 
Encountered 
Probability 
Distributions 


There are a number of probability distributions that occur quite frequently in the 
application of probability theory to practical problems. Mathematical expressions 
for the most common of these distributions are collected here along with their 
most important parameters. 

The following notation is used throughout: 


Pr(x)—probability of the event x occurring. 

fx Go)-—probability density function of the random variable X at the point x. 
X = E{x}—mean of the random variable X. 

ay = E((X — Xy]—variance of the random variable X. 


dlu) = | fx (x)e"“ dx—characteristic function of the random variable X. 


Discrete Probability Functions 


Bernoulli (Special case of Binomial) 


p x= | 
Pr(x) — g=l-p x=0 0cpzczl 
0 otherwise 
fx(x) = pó(x — 1) + qó(x) 
Y= > 
Ux = pq 
dtu) = 1 — p + pe” 
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Binomial 
(") mq x-ÍIEIZ & 3 
PRO =") Y , 
0 otherwise 
0<p<1 q=1-p Bm Ty Qe uw x 
fk@ = > (") p'q' "Sx — k) 
k=0 
X = np 
ox = npq 
d(u) = [1 — p + pe)" 
Pascal 
X = l | H X—H 
Prix) 2411-1 71 X---mH-dL.-. 
0 otherwise 


O<p<1 q-1-p  n=1,2,3,... 
X = np 
ox = nqp 
pu) = p'en — qe"]" 





Poisson 
Pr(x) = e x =0,1,2 
a — 0 
X a4 
oy — a 


plu) = ee" -v 
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Continuous Distributions 


Beta 
M xs. 
fc) = 1(a- 1b — 1)! 


0 otherwise 


x^ 1 i 3 0O<x< | 








m a 
a = EFD 
a: a 
- (a  by(a + b+ V) 
Cauchy 
- > -oL 
fx Go = 4T a + (x X by “<i 
a0 —ocbz«co 
Mean and variance not defined 
plu) = eod 
Chi-square 
fx(x) = l l 27 n2. n2) - 1, 7x2 x0 
E n\ otherwise 
2 
- n 0 l, Wy Ju 
X = H 
ox = 2n 


plu) = (l = 2ju) "^ 


Erlang 


n.n—1l,-—ax 


os _ SO 
fa) = | (n — 1)! x 


otherwise 
a 0 H = I2. ee 


373 


374 APPENDICES 


$(u) = (a — ju)" 


Exponential 
| ae ^ x0 
Ix = 109 l 
otherwise 
a0 
X = a! 
oe = am 


Gamma 
P xfe "o 
Sx@) = Feli aud 
0 otherwise 
a> -1 b>0 
X = (a + 1)b 
cy = (a+ 1)b’ 


$(u = (1 — jb ^"! 


Laplace 
mange" -etri 
a 0 -o <b 
a 
X=b 
T = 2a ^ 


hlu) - a^ e" (a? T pg» 


Log-normal 
exp {—[In (x — a) — b]’/20°} 
fx) = | V 23 a(x — a) 


0 
a> 0 -oLa — c «c b « co 


x=a 


otherwise 
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Y= ate 
oy? = et n Cade = j ) 
Maxwell 


otherwise 


Normal 


-— 2 —_ "s 
-(x-X) Day? wm Zr ed 


frw) = JL e 


Ox > 0 —c«cX-«o 
2 2 


plu) = pliX — (n gy f2) 
Normal-bivariate 
fxy y) = asor VI — p p 20 — p? Ty -Wy 
2 = =e 
-L ( - Xy(y - Y) | 
OXxOy 


-o< -0 <y<% oxy>O oy>0 
-l<p<= 1 








_ = l | 
d(u,v) = exp [juX + jvY — sox + 2puvayoy + voy) 


Rayleigh 
TQ, 00e x20 


a 
l0 otherwise 


376 
Uniform 
fx) 
X 
ox 
ou) 
Weibull 
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l 
2.4074 





a<x<b 


0 otherwise 
-0 <a <b<% 
a+b 
2 
(b — ay 


x9 
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Binomial 
Coefficients 


n . 

0 l 

l l l 

2 l 2 l 

3 l 3 3 l 

4 l 4 : l 

5 l 5 10 10 5 l 

6 l 6 15 20 15 6 l 

7 l 7 2] 35 35 2] 7 l 

8 l 8 28 56 70 56 28 8 l 

9 l 9 36 84 126 126 84 36 9 l 
10 l 10 45 120 210 252 210 120 45 10 
11 l 11 25 165 330 462 462 330 165 35 
12 l 12 66 220 495 792 924 792 495 220 
13 l 13 78 286 715 1287 1716 1716 1287 715 
14 l 14 91 364 1001 2002 3003 3432 3003 2002 
15 l 15 105 455 1365 3003 5005 6435 6435 5005 
16 l 16 120 560 1820 4368 8008 11440 12870 11440 
17 l 17 136 680 2380 6188 12376 19448 24310 24310 
18 l 18 153 816 3060 8568 18564 31824 43758 48620 

l 19 171 969 3876 11628 27132 50388 75582 92378 


Useful Relationships 
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Normal Probability 
Distribution Function 





Q(x) = -== | E e? dt; ((—x) = 1 — Dx) 

.01 .02 .03 .04 .05 .06 .07 .08 

.5040 .5080 .5120 5160 5199 .5239 5279 5319 
5438 5478 5517 5557 .5596 5636 5675 5714 
.5832 5871 5910 .5948 .5987 .6026 .6064 .6103 
.6217 .6255 .6293 .6331 6368 .6406 .6443 .6480 
.6591 .6628 .6664 .6700 .6736 .6772 6808 .6844 
.6950 .6985 .7019 .7054 .7088 .7123 7157 .7190 
.7291 .7324 .7357 7389 .7422 .7454 7486 .7517 
.7611 .7642 .7673 7704 7134 7164 7794 .7823 





.8106 
8365 
8599 
.8810 
8997 
.9162 
9306 
,9429 
.9535 
.9625 
9699 
.9761 
.9812 
.9854 
9887 
9913 
,9934 
.995 1 
.9963 
.9973 
.9980 
.9986 
,9990 
.9993 
.9995 
9996 
.9998 
.9998 
.9999 
.9999 
1.0000 


APPENDIX E- 





The Q-Function 





x | 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 
0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 
0.1 0.4602 0.4562 0.4527 0.4483 0.4443 0.4404 0.4364 0.4325 
0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 
0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 
0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 
0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 
0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 
0.7 0.2420 0.2389 0.2358 0.2327 0.2297 0.2266 0.2236 0.2206 
0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 
0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 
1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 
1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 
1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0. 1038 0.1020 
1.3 0.0968 0.095] 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 
1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 
1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 
1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 
1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 
1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 
1.9 0.0287 0.028 | 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 
2.0 0.2275E—01 0.2222E-01 0.2169E-01 0.2118E-01 0.2068E-01 0.20]8E-0] — 0.1970E-01 0.1923E-0] 
2.1 0.1786E—01 0.1743E-01 0.1700E-01 0.1659E-01 0.1618E-01 0.1578E-0] — 0.1539E-01 0.1500E-01 
2.4 0. 1390E—01 0.1355E-01 0.1321E-01 0.1287E-01 0.1255E-01 0.1222E-01 0.1191E-01 0. 1160E-01 
2:3 0.1072E-01 0.1044E-01 0.1017E-01 0.9903E-02 0.9642E-02 0.9387E-02 . 0.9137E-02 (.8894E-02 
2.4 0.8198E-02 . 0.7976E-02 0.7760E-02 . 0.7549E-02 0.7344E-02 0.7143E-02 . 0.6947E-02 0.6756E-02 
2.5 | 0.6210E-02 0.6037E-02 0.5868E-02 . 0.5703E-02 0.5543E-02 0.5386E-02 0.5234E-02 0.5085E-02 


0.08 


0.468] 
0.4286 
0.3897 
0.3520 
0.3156 
0.2810 
0.2483 
0.2177 
0.1894 
0.1635 
0.1401 
0.1190 
0.1003 
0.0838 
0.0694 
0.0571 
0.0465 
0.0375 
0.0301 
0.0239 
0.1876E-01 
0.1463E-01 
0.1130E-01 
0,8656E-02 
0.6569E-02 
0.4940E—02 


Q(—x) = 1 — OW 


0.09 


0.464] 
0.4247 
0.3859 
0.3483 
0.3121 
0.2776 
0.245] 
0.2148 
0.1867 
0.1611 
0.1379 
0.1170 
0.0985 
0.0823 
0.0681 
0.0559 
0.0455 
0.0367 
0.0294 
0.0233 
0.1831E-01 
0.1426E-01 
0.1101E-01 
0.8424E-02 
0.6387E-02 
0.4799E-02 


= 


IDDD DD D OD OD D D U U L U L LA L La ita SEE EE REE RWW WWW a Ua tà t 





0.00 
0.1350E-02 


0.3369E-03 
0.2326E-03 
0.1591E-03 
0. 1078E-03 


| 0.7235E-04 


0.48 I0E-04 
0.3167E-04 
0.2066E-04 
0.1335E-04 
0.8540E-05 
0.541 3E-05 
0.3398E-05 
0.2112E-05 
0. 1301E-05 
0.7933E-06 
0.4792E-06 
0.2867E-06 
0.1698E-06 
0.9964E-07 
0.5790E-07 
0.3332E-07 
0.1899E-07 
0.1072E-07 
0.5990E—08 
0.3316E-08 
0. 1818E-08 
0.9866E—09 
0.5303E-09 
0.2823E-09 
0.1488E-09 
0.7769E-10 
0.4016E-10 
0.2056E-10 
0. I1042E-10 
0.5231E-11 
0.2600E- 11 
0.1280E-11 


0.01 


0. 1306E-02 
0.9354E-03 
0.6637E-03 
0.4665E-03 
0.3248E-03 
0.224] E-03 
0.1531E-03 
0.1036E-03 
0.6948E-04 
0.4615E-04 
0.3036E-04 
0.1978E-04 
0.1277E-04 
0.8163E-05 
0.5169E-05 
0.3241E-05 
0.201 3E-05 
0.1239E-05 
0.7547E-06 
0.4554E-06 
0.2722E-06 
0.1611E-06 
0.9442E-07 
0.548 1E-07 
0.3151E-07 
0.1794E-07 
0.1012E-07 
0.5649E-08 
0.3124E-08 
0.1711E-08 
0.9276E-09 
0.4982E-09 
0.2649E-09 
0.1395E-09 
0.7276E-10 
0.3758E-10 
0.1922E-10 
0.9731E-11 
0.4880E- 1 I 
0.2423E-11 
0.1192E-11 


0.02 


0.1264E—-02 
0.9043E-03 
0.6410E—-03 
0.4501 E-03 
0.3131E-03 
0.2158E-03 
0.1473E-03 
0.9961 E-04 
0.6673E—04 
0.4427E-04 
0.29] 0E-04 
0.1894E-04 
0.1222E-04 
0.7802E-05 
0.4935E-05 
0.3092E-05 
0.1919E-05 
0.1179E-05 
0.7178E-06 
0.4327E-06 
0.2584E-06 
0.1528E-06 
0.8946E-07 
0.5188E-07 
0.2980E-07 
0. 1695E-07 
0.9548E-08 
0.5326E-08 
0.29042E-08 
0. 1610E-08 
0.8721E-09 
0.4679E—09 
0.2486E-09 
0.1308E-09 
0.6814E-10 
0.3515E-10 
0.1796E-10 
0.9086E- 1 | 
0.4552E-11 
0.2258E-11 
0.1 IO9E-11 


: 
1.28115 


0.03 


0.1223E-02 
0.8740E-03 
0.6190E-03 
0.4342E-03 
0.301 8E-03 
0.2078E-03 
0.1417E-03 
0.9574E-04 
0.6407E—-04 
0.4247E-04 
0.2789E-04 
0.1814E-04 
0.1168E-04 
0.7456E—05 
0.471 2E-05 
0.2949E-05 
0.1828E-05 
0.1123E-05 
0.6827E—06 
0.41 12E-06 
(.2452E-06 
0). 1449E—06 
0.8475E-07 
0.49] IE-07 
0.28I8E-07 
0.1601 E-07 
0.901 1E-08 
0.5022E-08 
0.2771E-08 
0.1515E-08 
0.8198E—09 
0.4394E—09 
0.2332E-09 
0.1226E-09 
0.6380E-10 
0.3289E-10 
0.1678E-10 
0.8483E-11 
0.4246E-11 
0.2104E-11 
0.1033E-11 


0.04 

0.1183E-02 
0.844 7E-03 
(0.5977E-03 
0.4189E-03 
0.2909E-03 
0.2001 E-03 
0.1363E-03 
0.9201E-04 
0.6152bE-04 
0.4074E-04 
0.2673E-04 
0.1737E-04 
0.11 18E-04 
0.7124E-04 
0.4498E-05 
0.2813E-05 
0.1742E—-05 
0. 1069E-05 
0.6492E-06 
0.3906E-06 
0.2328E-06 
0.1374E-06 
0.8029E-07 
0.4647E-07 
0.2664E-07 
0.1512E-07 
0.8503E-08 
0.4734E-08 
0.2610E-08 
0.1425E-08 
0.7706E-09 
0.4126E-09 
0.2188E-09 
0.1149E-09 
0.5974E-10 
0.3076E-10 
0.1568E-10 
0.79]9E-11 
0.3960E-1 1 
0.1961E-11 
0.9612E-12 


0.05. 


0.1144E-02 
0.8164E-03 
0.5770E-03 
0.404 1E-03 
0.2803E-03 
0.1926E-03 
0.131 1E-03 
(0.8842E—04 
0.5906E-04 
0.3908E-04 
0.2561E-04 
0.1662E-04 
0. 1069E-04 
0.6807E-05 
0.4294E-05 
0.2682E-05 
0.1660E-05 
0.1017E-05 
0.6173E-06 
0.371 1E-06 
0.2209E-06 
0.1302E-06 
0.7605E-07 
0.4398E-07 
0.2518E-07 
0.1428E-07 
0.8022E-08 
0.4462E-08 
(0.2458E-08 
0.1341E-08 
0.7242E-09 
(0.3874E-09 
0.2052E-09 
0.1077E-09 
0.5593E-10 
0.2877E-10 
0.1465E- 10 
0.7392E-11 
0.3693E-11 
0.1826E-11 
0.8946E-12 


Qo) 
1E-06 
IE-07 
IE-08 
1 E-09 
1E-10 


0.06 


0.1107E-02 
0.7888E-03 
0.5571E-03 
0.3897E-03 
0.2701 E-03 
0. 1854E-03 
0.1261E-03 
0.8496E-04 
0,5669E—04 
0.3748E-04 
0.2454E-04 
0.1591 E-04 
0.1022E-04 
0.6503E-05 
0.4098E-05 
0.2558E-05 
0.1581E-05 
0.9680E—06 
0.5869E-06 
0.3525E-06 
0.2096E-06 
0.1235E-06 
0.7203E-07 
0.4161E-07 
0.238] E-07 
0.1349E-07 
0.7569E-08 
0.4206E-08 
0.2314E-08 
0.1261E-08 
0.6806E-09 
0.3637E-09 
0.1925E-09 
0. 1009E—09 
0.5235E-10 
0.2690E- 10 
0.1369E-10 
0.6900E- 1 1 
0.3443E-11 
0.1701 E-11 
0.8325E-12 


0.07 

0.1070E-02 
0.7622E-03 
0.5377E-03 
0.3758E-03 
0.2602E-03 
0.1785E-03 
0.1213E-03 
0.8162E-04 
0.5442E-04 
0.3594E-04 
0.2351 E-04 
0.1523E-04 
0.9774E-05 
0.6212E-05 
0.39] TE-05 
0.2439E-05 
0.1506E-05 
0.921 1E-06 
0.5580E—06 
0.3348E-06 
0. 1989E-06 
0.1170E-06 
0.6821 E-07 
0.3937E-07 
0.2250E-07 
0.1274E-07 


0.9451 E-10 
0.4900E- 10 
0.2516E-10 
0.1279E-10 
0.6439E-1 1 
0.3210E-11 
0.1585E-11 
0.7747E-12 


0.08 
0.1035E-02 


0.7364E-03 
0.5190E-03 
0.3624E-03 
0.2507E-03 
0.1718E-03 
0.1166E-03 
0.7841] E-04 
0.5223E-04 
0.3446E-04 
0.2252E-04 
0.1458E-04 
0.9345E-05 
0.5934E-05 
0.3732E-05 
0.2325E-05 
0.1434E-05 
0.8765E-06 
0.5304E—06 
0.3179E-06 
0.1887E-06 
0.1 109E-06 
0.6459E—07 
0.3724E-07 
0.2127E-07 
0.1203E-07 
0.6735E-08 
0.3735E-08 
0.2051 E-08 
0.1116E-08 
0.6009E-09 
0.3205E-09 
0.1693E-09 
0.8854E- 10 
0.4586E- 10 
0.2352E-10 
0.1 I195E-10 
0.6009E-11 
0.2993E-11 
0.1476E-11 
0.7208E-12 


0.09 


0.1001] E-02 
0.7114E-03 
0.5009E-03 
0.3495E-03 
0.2415E-03 
0.1653E-03 
0.1121E-03 
0.7532E-04 
0.5012E-04 
0.3304E-04 
0.2157E-04 
0.1395E-04 
0.8934E-05 
0.5668E-05 
0.3561 E-05 
0.2216E-05 
0.1366E-05 
0.8339E-06 
0.5042E—06 
0.30] 9E-06 
0.1790E—06 
0.1051E-06 
0.6116E-07 
0.3523E-07 
0.2010E-07 
0.1135E-07 
0.6352E-08 
0.3519E-08 
0.1931 E-08 
0. 1049E—08 
0.5646E-09 
0.3008E-09 
0.1587E-09 
0.8294E- 10 
0.4292E-10 
0.2199E-10 
0.111I6E-10 
0.5607E-11 
0.2790E-11 
0.1374E-11 
0.6706E-12 
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Student’s t 
Distribution Function 











r(* Lu J - 
t 2 Xy, oxi 
Fr(t) - [,.———( t dx 


Li E 2 
V var r(3) 
a ee | 0.60 0.75 0.90 0.95 0.975 0.99 0.995 0.9995 
l 0.325 1.000 3.078 6.314 12.71 31.82 63.66 636.6 
2 0.289 0.816 1.886 2.920 4.303 6.965 9 025 31.60 
3 0.277 0.765 1.638 2.353 3.182 4.541 5.841 12.94 
4 0.271 0.741 1.533 2.132 2.776 3.747 4.604 8.611 
5 0.267 0.727 1.476 2.015 2.571 3.365 4.032 6.859 
6 0.265 0.718 1.440 1.943 2.447 3.143 3.707 5.959 
T 0.263 0.711 1.415 1.895 2.365 2.998 3.499 5.405 
8 0.262 0.706 1.397 1.860 2.306 2.896 3.355 5.041 
9 0.261 0.703 1.383 1.833 2.262 2.821 3.250 4.781 
10 0.260 0.700 1.372 1.812 2.228 2.764 3.169 4,587 
11 0.260 0.697 1.363 1.796 2.201 2.718 3.106 4.437 
12 0.259 0.695 1.356 1.782 2.179 2.681 3.055 4.318 
13 0.259 0.694 1.350 1.771 2.160 2.650 3.012 4.221 
14 0.258 0.692 1.345 1.761 2.145 2.624 2.977 4.140 
15 0.258 0.691 1.341 1.753 2.131 2.602 2,947 4.073 
16 0.258 0.690 1.337 1.746 2.120 2.583 2.921 4.015 
17 0.257 0.689 1.333 1.740 2.110 2.567 2.898 3.965 
13 0.257 0.688 1.330 1.734 2.101 2.552 2.878 3.922 
19 0.257 0.688 1.328 1.729 2.093 2.539 2.861 3.883 
20 0.257 0.687 1.325 1.725 2.086 2.528 2.845 3.850 
21 0.257 0.686 1.323 1.721 2.080 2.518 2.831 3.819 
22 0.256 0.686 1.321 1.717 2.074 2.508 2.819 3.792 
23 0.256 0.685 1.319 1.714 4.069 2.500 2.807 3.767 
24 0.256 0.685 1.318 1.711 2.064 2.492 2.797 3.745 
25 0.256 0.684 1.316 1.708 2.060 2.485 2.787 3.725 
0.256 0.684 1.315 1.706 2.055 2.479 2.779 3.707 
0.256 0.684 1.314 1.703 2.052 2.473 2.771 3.690 
0.256 0.683 1.313 1.701 2.048 2.467 2.763 3.674 
0.256 0.683 1.311 1.699 2.045 2.462 24.756 3.659 
0.256 0.683 1.310 1.697 2.042 2.457 2.750 3.646 
0.255 0.681 1.303 1.684 2.021 2.423 2.704 3.551 
0.254 0.679 1.296 1.671 2.000 2.390 2.660 3.460 
0.254 0.677 1.289 1.658 1.980 2.358 2.617 3.373 





0.253 0.674 1.282 1.645 1.960 2.326 2.576 3.291 
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Computer Programs 
for Estimating 
Correlation Functions 


and Spectral Density 


The following Fortran programs are useful for estimating autocorrelation functions 
and spectral densities from data that has been stored in a file called ‘data.’ It is 
assumed the data consist of sample values from a random time function that has 
been sampled at equally spaced intervals separated by At = 1 second. The total 
number of data points is mm and should be put in the file called ‘data’ with a 
format of (f10.7). For purposes of illustration, we have set nn = 31 and mm = 
1000, but any other values can be used so long as nn < mm. 

The first program calculates the sample mean of the data and the autocorrelation 
function R(kAt) fork = 0, 1, . . . , nn. It also calculates an upper bound on the 
variance of the estimated autocorrelation function using the estimated data values. 
The second program also calculates the estimated autocorrelation function and 
then uses this estimate to make a Hamming-window estimate of the spectral den- 
sity, S(qåw), q = 0, 1, . . ., nn. 


Estimated Autocorrelation Function 


parameter (nn — 31, mm = 1000) 


Variable declarations 


o 


integer i1, i2, n 
real data(0:(mm — 1)), r(O:nn), mean, var 


383 
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C 
open(unit = 2,file = 'data',status = ‘old’) 
do 4 11=0, (mm- 1) 

4 read (2,5) data(i1) 

5 format(f10.7) 

C 

C Calculate the mean. 

C 


do 10 i1 20, (mm- 1) 
mean = mean + data(i1) 
10 continue 
mean = mean/mm 
print *,'"The mean is’,mean 


Find the estimated autocorrelation function and 
maximum variance. 


oO 00 O 


print *,^ The autocorrelation values are:' 
do 20 n=0,nn 
do 30 i2 —- 0, (mm—- n - 1) 
r(n) =r(n) + data(i2)*data(i2 + n) 
30 continue 
r(n) =r(n)/(mm - n) 
print *,’ R(’,n,’)’,r(n) 
var — var 4 r(n)**2 
20 continue 
var = (var*4.0—2.0*r(0))/mm 
print *, 'The maximum estimate variance is', var 
stop 
end 


Estimated Spectral Density 


parameter (nn — 31, mm = 1000) 


C Variable declarations 


0000 » 


30 


o 00 O 


50 


40 
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integer i1, i2, ib, n, q 


real data(0:(mm — 1)), r(0:nn), rs( — 1:(nn + 1)), hs(0:nn), pi 


pi = 4.0*atan(1.0) 

open(unit = 2,file = ‘data’ status = ‘old’) 
do 4 i1=0, (mm —- 1) 

read (2,5) data(i1) 

format(f10.7) 


Find the autocorrelation estimates and maximum 


variance. 
print *,^ The estimated correlation values are:' 
do 20 n=0,nn 

do 30 i2=0,(mm—n-—1) 
r(n) =r(n) + data(i2)*data(i2 + n) 

continue 

r(n) =r(n)/(mm —n) 
print *,’ R(’,n,’)’,r(n) 
continue 


Find the Hamming-window estimate of the 
spectral density. 


print *,’ The estimates of spectral density are:’ 
do 40 q=0,nn 

sum = 0.0 

do 50 i5 — 1, (nn — 1) 

sum = 2.0'r(i5b)*cos(q'i5b'*pi/nn) + sum 

continue 

rs(q) ^ r(0) + r(nn)*cos(q'i5*pi/nn) + sum 
continue 
rs( — 1) 2 rs(1) 
rs(nn + 1) - 0.0 


do 60 q=0,nn 


385 


386 


60 
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hs(q) =0.54*rs(q) + 0.23* (rs(q + 1) + rs(q — 1)) 
print *,'S(', q*pi/nn,’) =’,hs(q) 

continue 

stop 

end 
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Table of Correlation 
Function—Spectral 
Density Pairs 


T sin*(uT/2) 
(uT/2)* 


— - a — 
a? + (i — ug)? 


i 
o ee 
a? + (u +u = 


er (a) + tg) T T 
+ wd (w — w,) | 


1j ilg 
| 1 


sinar WaT 


1 Low-pass Spectrum 
2nWi,T 


-2:W, 2zW, 


m [aree | 
sin (Br/2) COS mar Band-pass Spectrum =|. 
(Br/2) wae | 
— Wo üg 
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Contour 
Integration 


Integrals of the following types are frequently encountered in the analysis of linear 
systems 


] TEM sil 
Inj -- F(s)e" ds (I-1) 
os | Sy(s) ds (1-2) 
2m] J-> 


The integral of (I-1) is the inversion integral for the Laplace transform while (I- 
2) represents the mean-square value of a random process having a spectral density 
$x (s). Only in very special cases can these integrals be evaluated by elementary 
methods. However, because of the generally well-behaved nature of their inte- 
grands, these integrals can frequently be evaluated very simply by the method of 
residues. This method of evaluation is based on the following theorem from com- 
plex variable theory: if a function F(s) is analytic on and interior to a closed 
contour C, except at a number of poles, then the integral of F(s) around the 
contour is equal to 27j times the sum of the residues at the poles within the 
contour. In equation form this becomes 


oF (s) ds = 2mj >) residues at poles enclosed (I-3) 


What is meant by the left-hand side of (I-3) is that the value of F(s) at each point 
on the contour, C, is to be multiplied by the differential path length and summed 
over the complete contour. As indicated by the arrow, the contour is to be trav- 
ersed counterclockwise. Reversing the direction introduces a minus sign on the 
right-hand side of (1-3). 

In order to utilize (1-3) for the evaluation of integrals such as (I-1) and (I-2), 
two further steps are required: We must learn how to find the residue at a pole 
and then we must reconcile the closed contour in (1-3) with the apparently open 
paths of integration in (I-1) and (1-2). 

Consider the problem of poles and residues first. A single valued function F(s) 

388 
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is analytic at a point, s = So, if its derivative exists at every point in the neigh- 
borhood of (and including) so. A function is analytic in a region of the s-plane if 
it is analytic at every point in that region. If a function is analytic at every point 
in the neighborhood of so, but not at so itself, then so is called a singular point. 
For example, the function F(s) = l/(s — 2) has a derivative F'(s) = —l/(s — 
2)". It is readily seen by inspection that this function is analytic everywhere except 
at s = 2, where it has a singularity. An isolated singular point is a point interior 
to a region throughout which the function is analytic except at that point. It is 
evident that the above function has an isolated singularity at s — 2. The most 
frequently encountered singularity is the pole. If a function F(s) becomes infinite 
ats = sg in such a manner that by multiplying F(s) by a factor of the form (s — 
So)", where n is a positive integer, the singularity is removed, then F(s) is said to 
have a pole of order n at s = sọ. For example, the function l/sins has a pole at s 
— 0 and can be written as 


l l 
F(s) = — -—————————— 
(s) sins s — S3! + $51 ee 


Multiplying by s [that is, the factor (s — 5s5)] we obtain 


Ly l 


0) = Sais Sas 1 eB + 


which is seen to be well-behaved near s = 0. It may therefore be concluded that 
1/sins has a simple (that is, first order) pole at s = 0. 

It is an important property of analytic functions that they can be represented by 
convergent power series throughout their region of analyticity. By a simple exten- 
sion of this property it is possible to represent functions in the vicinity of a sin- 
gularity. Consider a function F(s) having an nth order pole at s = sọ. Define a 
new function (s) such that 


p(s) = (s — so)" F(s) (I-4) 


Now @(s) will be analytic in the region of s, since the singularity of F(s) has been 
removed. Therefore, $(s) can be expanded in a Taylor series as follows: 


p(s) = A-n + A-n+1(S — 50) + A-n+2l5 — soy (l-5) 
Rocco A(S — s"! + D> Bas — so)"** 
k=0 


Substituting (I-5) into (I-4) and solving for F(s) gives 

re Binz A_ 
= - —M pl l 
(s — So) (s — so) $ — $9 





F(s) = + >) Bas — s) (1-6) 
k=0 

This expansion is valid in the vicinity of the pole at s = so. The series converges 

in a region around s, that extends out to the nearest singularity. Equation (1-6) is 

called the Laurent expansion or Laurent series for F(s) about the singularity at 
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s = So. There are two distinct parts to the series: The first, called the principal 
part, consists of the terms containing (s — So) raised to negative powers; the 
second, sometimes called the Taylor part, consists of terms containing (s — So) 
raised to zero or positive powers. It should be noted that the second part is ana- 
lytic throughout the s-plane (except at infinity) and assumes the value B at s — 
59. If there were no singularity in F(s), only the second part of the expansion 
would be present and would just be the Taylor series expansion. The coefficient 
of (s — so) ', which is A_, in (I-6), is called the residue of F(s) of the pole at 
5 = Sp 

Formally the coefficients of the Laurent series can be determined from the usual 
expression for the Taylor series expansion for the function (s) and the subsequent 
division by (s — so)". For most cases of engineering interest, simpler methods 
can be employed. Due to the uniqueness of the properties of analytic functions it 
follows that any series of the proper form [that is, the form given in (I-6)] must, 
in fact, be the Laurent series. When F(s) is a ratio of two polynomials in s, a 
simple procedure for finding the Laurent series is as follows: Form (s) = 
(s — So)"F(s); let s — so = v ors = v + sg; expand d(v + so) around v = 0 
by dividing the denominator into the numerator; and replace v by s — so. As an 
example consider the following 


F(s) — : 
vt sc -1) 
Let it be required to find the Laurent series for F(s) in the vicinity of s — — I: 
M Fe D 
Lets =v-— 1 
2 2 
ey ) -r+ DI-D) v—- 4’ -3v-2 
3v v^ 
| | 2 4 
-2-3»-4w +7 )2 
3 + 3p dy — 
— 3v — 4v 4 y 
9y? x. y 
—3y— — -6 + — 
"7 2 "OT 
: m 
y^ 3v 
-+m — — 
a om 2 
3 l. 
$(v—1)-2-1-«4zv—-—-v-—-::- 
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Replacing v — 1 by s gives 


H9 = 14564 D- T6412 





F(s) = — ! D-16*D0-0e 


+ == 
sti 4 
The residue is seen to be — 1. 
A useful formula for finding the residue at an nth order pole, s = so, is as 
follows:! 


= o" (so) 


*? (n— 1)! uL 


where (s) = (s — so) F(s). This formula is valid for n = 1 and is not restricted 
to rational functions. 

When F(s) is not a ratio of polynomials it is permissible to replace transcenden- 
tal terms by series valid in the vicinity of the pole. For example 


| sins l sj s 
ro) = SE -i -T.m-e. 
Qi m Qf ln 
5 3! 5! 


In this instance the residue of the pole at s = O is 1. 

There is a direct connection between the Laurent series and the partial fraction 
expansion of a function F(s). In particular, if H;(s) is the principal part of the 
Laurent series at the pole s = 5; then the partial fraction expansion of F(s) can 
be written as 


F(s) = H,(s) + Hy(s):*-* Hs) + q(s) 


where the first k terms are the principal parts of the Laurent series about the k 
poles and q(s) is a polynomial aj + as + as’ + +++ aps” representing the 
behavior of F(s) for large s. The value of m is the difference of the degree of the 
numerator polynomial minus the degree of the denominator polynomial. In general 
q(s) can be determined by dividing the denominator polynomial into the numerator 
polynomial until the remainder is of lower order than the denominator. The re- 
mainder can then be expanded in its principal parts. 

With the question of how to determine residues out of the way, the only re- 
maining question is how to relate the closed contour of (I-3) with the open 
(straight line) contours of (I- 1) and (I-2). This is handled quite easily by restricting 
consideration to integrands that approach zero rapidly enough for large values of 
the variable so that there will be no contribution to the integral from distant por- 


"Where p" ^ "(s,) denotes the (n — 1)-derivative of (s), with respect to s, evaluated at s = 5,. 
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tions of the contour. Thus, although the specified path of integration in the s-plane 
may be from s = c — jo to c + jœ, the integral that will be evaluated will 
have a path of integration as shown in Figure I-1. The path of integration will be 
along the contour C, going from c — JR, to c + jR, and then along the contour 
C; which is a semicircle closing to the left. The integral can then be written as 


4 F(s) ds = [ F(s) ds +f F(s) ds (l-8) 
Ci, +C2 C| C? 


If in the limit as R, — œ the contribution from the right-hand integral is zero, 
then we have 


cj œ 
| F(s) ds = lim 4 F(s) ds 
a Bea Rg- IC1+C2 


(1-9) 


27j X residues at poles enclosed 


In any specific case the behavior of the integral over C; can be examined to verify 
that there is no contribution from this portion of the contour. However, the follow- 
ing two special cases will cover many of the situations in which this problem 
arises: 


1. Whenever F(s) is a rational function having a denominator whose order 
exceeds the numerator by at least two, then it may be shown readily that 


cj œ 
| F(s) ds — 1 F(s) ds 
o6 C, * C2 


c-j 


2. If F(s) is analytic in the left half plane except for a finite number of poles 
and tends uniformly to zero as |s| — œ with ø < 0 then for positive ¢ the 
following is true (Jordan's lemma) 


lim f F,(s)e" ds = 0 
Rg>> C2 





Figure I-1 Path of integration in the s-plane. 
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From this it follows that when these conditions are met, the inversion integral of 
the Laplace transform can be evaluated as 
cj œ 
f(t) = — F,(s)e" ds = EN F,(s)e" ds = X k; 
2mj-j- | 2nj Jci+c2 | T 
where k; is the residue at the jth pole to the left of the abscissa of absolute con- 
vergence. 
The following two examples illustrate these procedures: 

Example I-1. Given that a random process has a spectral density of the following 
form 
TM we 
(o^ + 1)(@ + 4) 
Find the mean-square value of the process. Converting to Sy(s) gives 


l l 
xO = FFEA  ($- Xs - 4) 
and the mean-square value is 
> l f E ds 
- imi Ps ECT 
27j J-je (s^ — ly s" — 4) 
From the partial fraction expansion for Sy(s), the residues are found to be 
l l 
(-1- 00-4 6 
l l 


(4—1(-2-2) Dp 


Sx(@) = 


= + k_, 


a = 


ks = 


Therefore, 








6 12 12 
Example l-2. Find the inverse Laplace transform of F(s) = = 
s(s + 2) 
ety e" 
i) = — ——— — ds = 
fO 2mj J-j s(s + 2) E Mc dig 
From (l-7) 
e" l 
a" KT gan dt - 
st — 2t 
k, =- ES. s 
S |s=-2 T 
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therefore 


fi) =- - e7”) t0 


t | = 
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Selected Answers 


CHAPTER 1 
1I-1.1 b) 1.633; 2.237 
l-4.1 a% bY c)» 
l-4.2 a)Vs b)Vs c) 
I-4.3 a)vV& b)'^ c)0.0602 
1-4.4 a) Yo b)*%o c)?9^6 
1I-4.5 a)“s b)'^^o c)Vs d) Yo 
1-4.6 a)Yio b) Yo c) "iso 
1-61 a)" bx 07⁄7 d)l e% f)?^^ 
1-62. a)*3 b)%2 c)"s d)%2 e) f)O g)V5 h)” DO 
1-6.3 a)0.0723 b)0.1493 ¢) 0.9955 4) 0.7826 e) 0.0769 f) 0.2174 
1-6.5 0.8075 
1-7.1 a)0.9246 b) 0.9468 c) 0.062 
1-7.2 | a) 0.0501 b) 0.0729 
1-7.3 a02 b)'^ oz 
1-7.4 74 
1-7.5 0.99639 
1-7.6 a01 bxy 
1-7.7 a)" bu own 
1-8.1 Dependent 
1-8.2 a) Independent b) Independent c) Independent 
1-8.4 a) Independent b) Independent 
1-9.1] b) 
1-9.2 b) 0.7298 €)0.3001 d) 0.0111 
1-10.1  a)0.1406 b) 0.0156 
1-10.4 a) 0.0123 b) 0.5926 
1-10.5 a) 350 b)150 c) 105 
1-10.6 a)0.6811 b)0.2701 c€)0.04877 4d) 0.3189 
1-10.7  a)lO0 b)4 c)5 
1-10.8 | a) 0.1268 b) 4.32 x 10? c) 4.36 x 10 * 


CHAPTER 2 


b) 0.37597 c) 0.05469 
a)O b)0.125 c)0.5 
a) 1 b)0.6321 c)0.3679 d) 0.8647 
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2-2.4 


T l 
ajA = y; b = 1 b > 
2-3.1  b)0.7734 c¢) 0.1718 
2-3.2  b)0.2325 c) 0.6321 
2-3.4  b)0.9653 €)9.158 x 10^? 
2-4.1 a)2 b)5 oc)! 

2-4.2 a)O b)0.758 c)O d)0.758 


; 2 
2-4.3 a)Vs b)4 c) 18 d)2 e)-16 f) ——-6 
n 


+ 2 
2-44 a) 12.5 b) 93.75; 9.68 c) 0.2373 
2-5.] a)0.9772 b) 0.4772 c) 0.0228 
2-5.2 a) 1875 b) 26,875 c)O d) 1750 
2-5.3 a)l b)4 c) 0.8413 
2-5.4 a)0.2867 x 10 ^ b) 0.9987 
2-6.] a) 12 b) 288; 0.0832 
2-6.2 bO ec) 15 
2-6.3 a)32 b) 0.3127 c) 0.1054 
2-6.4 a) = 3 b)3.6 c) 0.003865 
2-6.5 a)4.452 x 10* b)O ec) 547.7 
2-6.6 a)5 b) 10 c)3 


ac EN NN i 14 l 
2-7.1 LE Tc b)O c)'^ d)'^ 
2-7.2 +a) 58 b)64 c)6 

2-7.3 a)0.632 b) 0.1353 Cc) 0.0667 
2-7.4 a) 4000 b) 0.2865 c) 0.3935 
2-7.5 bO oc) 21 

2-8.] a)0.3679 b)8 


2-8.2 a aom ] s 
T 


mV I- x 
2-8.3 a)0.208 b) 0.944 
2-8.4 23) 5.1856 b) 8.0044 
2-9.1  23)7.979 b) 12.533 
2-92 apre "7-70 r= r b) 1.376 
2-9.3 b) 2.55 
2-9.4 a)%;% bo ec) 25 


CHAPTER 3 


3-1.1 | €) We 
3-1.2 a)4 b)xy 


3-13 a)r b)+ 
3-1.4 a)% b)12.25 c)1.429 
3-2.1 b) 7.979 


2x Ozxz-Lk 


0-zx 
a, 3A 6 
| Ow "o while 


"Os 


= | 
= | 
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3-2.2 a)2x b) = 2y 

3-2.3 b5 c2 

3-2.4 a) —-2y b) -6 

3-3.1 a) : b)s c)*54:5 


3 In 2 
3-3.2 W and V are statistically independent 
3-4.1  a)76 b)28 c)28;76 
3-4.2  2)481; 325 b) —0.1644 


3-4.3 a)l6 b)% c)421 


3 
3-44 a)3 yo gl 


3-5.1 b)025 
3-5.2 b)% c)0.68 

3-5.3 a)?» b) 0.161 

3-6.1 *%(e' — e ")u(r) 

3-63 ap b)p cp;p — 3p 


CHAPTER 4 
4-2.1  a)0.3727 b) “2 c)455 

4-2.2 a)0.2 c) 10° 

4-2.3  2)0.9897 b) 10 c)17 

42.44 a)12.5 b)11.218 c) 12 

4—2.5  2)0.663 b) 0.689 

4-3.1 a)0.0617 b) LO x 10^? 

4-3.2 5002 

4-3.3 322 

4-4.1  a)0.064 b) 0.0602 

4—4.2 a) 118.66 = X = 121.34 b) 116.42 = X = 123.58 
4-43 a 118.955 X=m b) 117.215 Xs 
4-5.1 a) Claim justified b) Claim rejected 
4-5.2 a) Claim rejected b) Claim rejected 
4-5.3 a)91% b) X = 3.997 years 

4—5.4 a2) 99.78 b) 38.94: 6.24 c) Claim valid 
4—6.1  b)y = 0.5455 + 0.6344x 

4—6.2  b)y = 326.33 — 15.14z 


CHAPTER 5 
5-1.1 b) 7776 c) Vrzz&s d) 1577% 
5-1.2 a)2.5 b)5 c)5 

2 


l — jf} TE, 
5-2.2 a) e DAI b)35(X,) c) 8(X,X,) 
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5-3.2 a)3 b)9 +3 c)40 

5—4.2 a) Nonstationary b) Wide-sense stationary. 
5-6.1  a)0.0362 b)i 

5-6.2 1.054 


CHAPTER 6 


a) 0.606 b)3.16 c) 16.07 
6-1.2 a) Wide-sense stationary b) Wide-sense stationary c) Not wide-sense station- 
ary d) Wide-sense stationary. 
6-2.1 a) 7,/2T 


] T, Ir] : 
b-—i1- — =É =T 
) 2T | J | I| 1 
I 7, Ir — G — KT i : 
== |] — [l +k — — k)T| = T 
AT | T, J r- )T| I 
ü elsewhere 
6-2.2 | T, lr — kT| 
=- — |1 — = <k = T. 
AT T. r« M ST 
0 elsewhere 
6-2.3 i G(T) 
B T 


3 
6-3.2 b)9 c)— sin 6r 
2T 


6-3.3 a) +6; 146; 110 b)0, 1, 3, Hz c) 0.318 
6-3.4 Only when T = 2 

6-4.1  a)0.0362 b) —0.0460 c) —0.0387 
6-4.2  2a)0.1740 b) 0.1664 

6-4.3 a) 0.959 — 35.52 |r| b) 0.0345 
6—4.4 2118 

6-5.1 Ave" 

6-5.2 2A*%e "n 

6-5.3 a) 10.0 b) 10.0 c) 10 

654 a)X = 0:0, =5_ b) Differentiable 
6-14 327 — De *- '? 

6-8.1 a) 10.00005 b) 0.146 


6-8.2 a) = sin KO — 4)] 


6—8.3 Tmax = | 

‘ Tmin — 0.1 
6-8.4 0.281 nsec. 
6-9.2 3 


d) 60 m/sec. 
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CHAPTER 7 


7-1.1 


7-2.2 
7-2.3 
7-3.1 
7-3.2 
7-3.3 
7-3.4 


7-4.1 


7-4.2 
7-5.2 
7-5.3 
7-5.4 


7—-6.1 
7-6.2 
7—-6.3 
7—6.4 
7-7.2 
7-8.1 
7-8.2 


7-9.1 
7-9.2 
7-10.2 


7-10.4 


a) 6 yay 52 às 
w w T 


4 
a)l6 b)l c)l6 d)4 
a) No b) Yes c) Yes d)No e)Yes f) No 
a)3 b)83.5 
a)4 b)40 c)O, + 6, + 12 
a)l b)i 

16(—5^ + 36) Poles: + 


2 | 
ee, B c) 0.573 
UES Bor wenx EE ) 
c) 11.39 
a) 11.2 b) 11.2 
0.05 
7 
800 sin*(0.025«) 
a) 10 b) ——SÀE C c) R(T) = 0; SA) = 0 
a)24 c) 0.0403 
a) 595 c) 594 


a) %s+0 b)4e ^ 
sin 100077 

a) o| 100077 | b)O c)0; 4.135 

a)l bO c)'55?.16 

16 — ow 

16 + o 

c) 0.00299 

Hamming window has lower sidelobes. 

a) 19.2 b) 79.4 


a)2B,V. «/2-1 b)2BV «(100-1 





CHAPTER 8 


8-2.1 
8-2.2 


8—3.1 
8-3.2 
8-3.3 
8—3.4 
8—4.] 


b)5 c) 64 
a) = B cos (20t + 0) — T sin (20r + 0) b)O c) 0.887 


a)O b)5.0 c)5 
a0 b8e ! ec) 8e! 
a)O b)'s c)s 


6.25 
a)0.5e 9" b) % 
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a) % b)?Yi ¢) %2 — 3|r| + 4r — sp] 


2A 
3 
Rxy(T) = 2- r 
Ryx(T) = "A+ 


a) 73A? 
a) z b)'4 x IO c) 34 x 107? 


cf xm] 


: otherwise 


a) 5.7 x 107° b) 2.65 x 10° c) 1.6 x 10° 


ost) s(s° — 1) 
a) 3s + 1 " (95? — 1)(s? — 16) 
46656 
—s* + 46656 
a (a ES 1) 
2(9w* + Do + 16) 
20 x 216° 
(c^ + 1)(w° + 46656) 
b) 0.04 
a)l b)1.047B,; 
a) 0.1875 b) 12 c)12 


l 
b) 0.13 c) 2.63 
Edd = e 


l = 
— x 107° 
T 





b) 


a) 


a) b) 20 





a) 


W TT 


2 M 1.386 





FE -i 
a) F, + T b) 6.1 x IO c) 1.57 


CHAPTER 9 


9-2.1 


9-2.2 


9—4.1 
9-4.2 
9—4.3 
9-4.4 
9-5.1 
9-5.2 


Maximum signal-to-noise ratio: (c), (d) 


4 


cy) 


Minimum mean-square error: (a), (b), (e), (f) 


40b? " Ap? 
Ye + (0600Y] ” mib + (160r) 
a) 80 b) 1.1 

0.648 

3 b) 9.16 


a) 1.59 b) 1.43 
a)s(2 — Du(t) b) 46.67 c) 20 
a) 50 b) 2.661 


c) 


2(99? + Do? + 16) + 4) 


2p 
Smib? + (1607)'] 


9-5.3 
9-5.5 


9-6. 1 


9-6.2 


9-6.3 
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a)N b) 100 
a) 10^ 4e- 099.6 x 10-5 — u(t) b) 0.5 x 19733 
16 
0.lw* + 17.6 ons 
9.27 


a) —— 
s+ NV 176 
b) 4 


a) 


b) 0.927 
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A 


Analog-to-digital conversion, errors in, 81 
A posteriori probability, continuous, 118 
discrete, 26 
Applications of probability, 2 
A priori probability, continuous, 118 
discrete, 26 
Autocorrelation function, 190 
binary process, 193 
of derivatives, 210 
examples of, 204 
computer program for estimating, 
383 
properties of, 196 
relation to spectral density, 251, 387 
of system output, 292 
Automatic control system, 318 
Axioms, of probability, 19 


B 


Bandlimited white noise, 257 

Bandwidth, equivalent-noise, 314, 316 
half-power, 315 

Bayes’ theorem, continuous, 116 
discrete, 26 

Bernoulli distribution, 140, 371 

Bernoulli trials, 32 

Beta distribution, 373 

Binary signals, autocorrelation function, 

193 

probability density function, 86 
spectral density, 271 

Binomial coefficient, 33 
table of, 377 

Binomial distribution, 372 


403 


Index 


Cartesian product space, 30 
Cauchy distribution, 373 
Central limit theorem, 69 
Central moments, 61 
example of, 62 
of Gaussian random variable, 69 
Certain event, 19 
Characteristic function, 131 
Bernoulli distribution, 140 
definition, 131 
moments from, 133 
of sums, 132, 133 
for two random variables, 132 
Chi-square distribution, 77, 373 
Combined experiments, 30 
Complement, of sets, 16 
Conditional expectation, 91 
mean, 91 
Conditional probability, 22, 88 
density and distribution functions, 88 
for two random variables, 115 
Confidence interval, 155 
Continuous probability, 7 
Continuous random process, 172 
Contour integration, 388 
Convolution, of density functions, 128 
Correlation, 113, 189 
coefficient, 123 
for independent random variables, 124 
matrix, 216 
Correlation functions, 190 
autocorrelation, 190 
crosscorrelation, 207 
of derivatives, 210 
in matrix form, 216 
table of spectral density pairs, 387 
time autocorrelation, 191 


404 


Covariance, 123 


matrix, 218 


Crosscorrelation functions, 207 


of derivatives, 210 

examples of, 211 

between input and output of a system, 
297 


properties of, 209 


Cross-spectral density, 259 
Curve fitting, 161 


D 


least-squares fit, 163 


Definite integrals, table of, 367 
Delta distributions, 86 


binary, 86 
multi-level, 87 


DeMoivre-Laplace theorem, 35 
DeMorgan's laws, 16 
Density function, probability, 54 


definition of, 54 
properties of, 54 
table of, 371 
transformations, 56 
types of, 54 


Density, spectral, 230 

Deterministic random process, 175 
Digital communication system, 40, 44 
Discrete probability, 7 

Discrete random process, 172 
Distribution function, probability, 50 


E 


definition of, 50 
properties of, 50 
table of, 371 
types of, 54 


Electronic voltmeter, scale factor, 108 
Elementary event, 19 

Elements, of a set, 13 

Empty set, 14 

Ensemble, 49 


averaging, 60 


INDEX 


Equivalent-noise bandwidth, 314, 316 
Ergodic random process, 179 
Erlang distribution, 85, 373 
Erlang random variable, 85 
Error function, 66 
Estimates, autocorrelation function, 201 
computer programs for, 383 
mean value, 144, 181 
signal in noise, 118 
spectral density, 262 
variance, 149 
Estimates, mean of, 144, 182 
variance, 145, 183 
Estimation, from p(x/y), 118 
Event, certain, 19 
composite, 6 
definition of, 6 
elementary, 6, 19 
impossible, 19 
probability of, 19 
for a random variable, 50 
Expected value, 60 
Experiment, 5 
Exponential distribution, 83, 374 
applications of, 84 


F 


Finite-time integrator, 301 
Fourier transforms, 231 
relation to spectral density, 232 
table of, 368 


Frequency-domain analysis of systems, 


307 
cross-spectral density, 312 
examples of, 314 
mean-square value, 311 
spectral density, 308 


G 


Gamma distribution, 85, 374 
Gaussian random process, 183 
multi-dimensional, 220, 376 


Gaussian random variable, 64 
density function, 64 
distribution function, 65 
moments of, 69 

General moments, 61 


H 


Hamming-window function, 265 

Hanning-window function, 281 

Hypothesis testing, 157 
examples, 158 


Identities, trigonometric, table of, 365 
Impossible event, 19 
Impulse response, 285 

measurement of, 298, 304 

system output, 285 
Indefinite integrals, table of, 366 
Independence, statistical, 12, 27 

more than two events, 28 

of random variables, 120 
Integrals, definite, table of, 367 

indefinite, table of, 366 
Integration, in the complex plane, 249 
Interchange of integrals, 288 
Intersection of sets, 15 


J 


Joint characteristic function, 134 
Joint probability, 111 
density function, 111 
distribution function, 111 
example of, 112 
expected values from, 113 
multi-dimensional Gaussian, 125 
Jordan's lemma, 392 


L 


Lag window, 263 
Hamming, 265 


INDEX 


hanning, 281 

rectangular, 264 
Laplace distribution, 374 
Laplace transforms, table of, 370 
Laurent expansion, 389 
Level of significance, 159 
Limiter, 109 
Linear regression, 163 

example, 164 

least-squares, 163 
Linear system, 284 
Log-normal distribution, 78, 374 


M 


Marginal probability, 10 
density function, 113 

Matched filters, 345 
exponential pulse, 346 
periodic pulses, 348 
rectangular pulse, 345 

Mathematical tables, 365 
binomial coefficients, 377 


correlation function—spectral density 


pairs, 387 

definite integrals, 367 

Fourier transforms, 368 

indefinite integrals, 366 

Laplace transforms, one-sided, 370 

normal probability distribution, 378 

Q-function, 380 

Student's t-distribution, 382 

trigonometric identities, 365 
Matrix, correlation, 216 

covariance, 218 
Maxwell distribution, 75, 375 
Mean-square value, 60 

from residues, 250 

from spectral density, 245 

of system output, 289 

table for, 246 
Mean value, 59 

from autocorrelation function, 196 

conditional, 90 

from spectral density, 237 

of system output, 288 
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Measurement of autocorrelation functions, 
201 
crosscorrelation functions, 214 
impulse response, 298, 304 
spectral density, 262 
variance, 149 
Mixed random processes, 173 
Moments, 60 
Mutually exclusive events, 9 
sets, 16 


N 


Noise figure, 329 
Noise, thermal agitation, 329 
Nondeterministic random process, 175 
Nonergodic random process, 179 
Nonstationary random process, 177 
Normal-bivariate probability distribution, 
375 
Normal probability distribution, 65, 375 
bivariate, 125 
table of, 378 
Normalized covariance, 123 
Null set, 14 


O 


Optimum linear systems, 331 
criteria for, 332 
maximum signal-to-noise ratio, 343 
minimum mean-square error, 350 
parameter adjustment, 336 
restrictions on, 335 

Outcome, of an experiment, 5 


P 


Parseval's theorem, 232 
Pascal distribution, 372 
Point process, 174 
Poisson distribution, 372 
Population, 143 


Power distribution, 71 
Probability, applications of, 2 
a posteriori, 26 
a priori, 26 
axiomatic, 19 
conditional, 11, 22 
continuous, 7 
definitions of, 7 
discrete, 7 
joint, 9 
marginal, 10 
relative-frequency, 7, 8 
space, 19 
total, 24 
Probability density functions, 54 
definition of, 54 
marginal, 113 
properties of, 54 
of the sum of random variables, 126 
table of, 371 
transformations, 56 
types of, 54 
Probability distribution function, 50 
definition of, 50 
marginal, 113 
properties of, 51 
table of, 371 
types of, 51 
Probability distributions, Bernoulli, 371 
beta, 373 
binomial, 372 
Cauchy, 373 
chi-square, 27, 373 
Erlang, 85, 373 
exponential, 83, 374 
gamma, 85, 374 
Laplace, 374 
log-normal, 78, 374 
Maxwell, 75, 375 
normal, 64, 375 
normal-bivariate, 125, 375 
Pascal, 372 
Poisson, 372 
Rayleigh, 73, 375 
table of, 371 
uniform, 80, 376 
Weibull, 376 


Process, random, 170 
measurement of, 181 
types of, 171 


Q 


Q-function, definition of, 66 
evaluation of, 67 
table of, 380 

Quantizing error, 81 


R 


Radar, pulse amplitudes, 109 
crosscorrelation method, 213 
spectral density method, 327 

Raised-cosine signals, 270 
spectral density, 271 

Random process, 170 
measurement of, 181 
types of, 171 

Random variable, concept of, 48 
exponential, 83, 374 
Gaussian, 64, 375 
Rayleigh, 73, 375 
uniform, 80, 376 

Rational spectral density, 236 

Rayleigh distribution, 73, 375 
application to aiming problems, 75 
application to radar, 109 
application to traffic, 97 

Rectangular window function, 264 

Relative-frequency approach, 7, 8 

Reproducible density functions, 130 


S 


Sample, 143 

Sample function, 49 

Sample, mean, 143 
variance, 149 

Sampling theorem, 258 

Sampling theory, 142, 149 
examples, 146 
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Scatter diagram, 161 
Schwarz inequality, 344 
Set theory, 13 
Sets, complement of, 16 
difference of, 17 
disjoint, 16 
empty, 14 
equality of, 14 
intersection of, 15 
mutually exclusive, 16 
null, 14 
product of, 15 
sum of, 15 
union of, 15 
Venn diagram, 14 
Singular point, 389 
isolated, 389 
Space, of a set, 13 
probability, 19 
Spectral density, 230 
applications of, 269 
in the complex frequency plane, 243 
computer programs for estimating, 383 
for a constant, 237 
cross-spectral density, 259 
definition of, 233 
for a derivative, 242 
mean-square value from, 245 
measurement of, 262 
for a periodic process, 237 
properties of, 236 
for a pulse sequence, 240 
rational, 236 
relation to autocorrelation function, 251 
white noise, 256 
Standard deviation, 61 
Standardized variable, 123 
Stationary random processes, 177 
in the wide sense, 178 
Statistical independence, 27 
more than two events, 28 
of random variables, 120 
Statistical regularity, 9 
Statistics, applications of, 146, 147, 156, 
158, 160 
definition of, 141 
types of, 141 
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Student's t-distribution, 153 
table of, 382 

Subset, 13 
as events, 19 


Sums of random variables, mean of, 125 


density function, 126 
Gaussian, 130 
variance of, 125 

System analysis, 284 
frequency-domain methods, 307 
time-domain methods, 285 


T 


Thermal agitation noise, 329 

Time correlation functions, 191 
crosscorrelation, 208 

Time-domain analysis of systems, 285 
autocorrelation function, 292 
crosscorrelation function, 297 
examples of, 301 
mean value, 288 
mean-square value, 289 

Total probability, 24 

Traffic measurement, 97 

Transfer function, of a system, 307 

Transformation of variables, 56 
cubic, 58 
square-law, 58 

Transversal filter, 228 

Trial, 6, 32 

Trigonometric identities, table of, 365 

Two random variables, 110 


INDEX 


U 


Unbiased estimate, 144 
Uniform distribution, 80, 376 

applications of, 81, 82 
Union of sets, 15 


V 


Variables, random, 48 
Variance, 61 
estimate of, 149 
Venn diagram, 14 
Voltmeter, calibration, 108 
resistor, 96 


W 


Weibull distribution, 376 
White noise, 256 
approximation, 311 
autocorrelation function of, 257 
bandlimited, 257 
uses of, 314 
Wiener filter, 353 
error from, 355 
example of, 353 
Wiener-Khinchine relation, 253 


Z 


Zener diode circuit, 93 
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